Cargando…

Classification of domains in predicted structures of the human proteome

Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Schaeffer, R. Dustin, Zhang, Jing, Kinch, Lisa N., Pei, Jimin, Cong, Qian, Grishin, Nick V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041065/
https://www.ncbi.nlm.nih.gov/pubmed/36917664
http://dx.doi.org/10.1073/pnas.2214069120
_version_ 1784912623916023808
author Schaeffer, R. Dustin
Zhang, Jing
Kinch, Lisa N.
Pei, Jimin
Cong, Qian
Grishin, Nick V.
author_facet Schaeffer, R. Dustin
Zhang, Jing
Kinch, Lisa N.
Pei, Jimin
Cong, Qian
Grishin, Nick V.
author_sort Schaeffer, R. Dustin
collection PubMed
description Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
format Online
Article
Text
id pubmed-10041065
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-100410652023-09-14 Classification of domains in predicted structures of the human proteome Schaeffer, R. Dustin Zhang, Jing Kinch, Lisa N. Pei, Jimin Cong, Qian Grishin, Nick V. Proc Natl Acad Sci U S A Biological Sciences Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php). National Academy of Sciences 2023-03-14 2023-03-21 /pmc/articles/PMC10041065/ /pubmed/36917664 http://dx.doi.org/10.1073/pnas.2214069120 Text en Copyright © 2023 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Schaeffer, R. Dustin
Zhang, Jing
Kinch, Lisa N.
Pei, Jimin
Cong, Qian
Grishin, Nick V.
Classification of domains in predicted structures of the human proteome
title Classification of domains in predicted structures of the human proteome
title_full Classification of domains in predicted structures of the human proteome
title_fullStr Classification of domains in predicted structures of the human proteome
title_full_unstemmed Classification of domains in predicted structures of the human proteome
title_short Classification of domains in predicted structures of the human proteome
title_sort classification of domains in predicted structures of the human proteome
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041065/
https://www.ncbi.nlm.nih.gov/pubmed/36917664
http://dx.doi.org/10.1073/pnas.2214069120
work_keys_str_mv AT schaefferrdustin classificationofdomainsinpredictedstructuresofthehumanproteome
AT zhangjing classificationofdomainsinpredictedstructuresofthehumanproteome
AT kinchlisan classificationofdomainsinpredictedstructuresofthehumanproteome
AT peijimin classificationofdomainsinpredictedstructuresofthehumanproteome
AT congqian classificationofdomainsinpredictedstructuresofthehumanproteome
AT grishinnickv classificationofdomainsinpredictedstructuresofthehumanproteome