Cargando…

Characterization of intrinsically disordered regions in proteins informed by human genetic diversity

All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmed, Shehab S., Rifat, Zaara T., Lohia, Ruchi, Campbell, Arthur J., Dunker, A. Keith, Rahman, M. Sohel, Iqbal, Sumaiya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8942211/
https://www.ncbi.nlm.nih.gov/pubmed/35275927
http://dx.doi.org/10.1371/journal.pcbi.1009911
_version_ 1784673257666904064
author Ahmed, Shehab S.
Rifat, Zaara T.
Lohia, Ruchi
Campbell, Arthur J.
Dunker, A. Keith
Rahman, M. Sohel
Iqbal, Sumaiya
author_facet Ahmed, Shehab S.
Rifat, Zaara T.
Lohia, Ruchi
Campbell, Arthur J.
Dunker, A. Keith
Rahman, M. Sohel
Iqbal, Sumaiya
author_sort Ahmed, Shehab S.
collection PubMed
description All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase (“UniProt features”: active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.
format Online
Article
Text
id pubmed-8942211
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-89422112022-03-24 Characterization of intrinsically disordered regions in proteins informed by human genetic diversity Ahmed, Shehab S. Rifat, Zaara T. Lohia, Ruchi Campbell, Arthur J. Dunker, A. Keith Rahman, M. Sohel Iqbal, Sumaiya PLoS Comput Biol Research Article All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase (“UniProt features”: active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms. Public Library of Science 2022-03-11 /pmc/articles/PMC8942211/ /pubmed/35275927 http://dx.doi.org/10.1371/journal.pcbi.1009911 Text en © 2022 Ahmed et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ahmed, Shehab S.
Rifat, Zaara T.
Lohia, Ruchi
Campbell, Arthur J.
Dunker, A. Keith
Rahman, M. Sohel
Iqbal, Sumaiya
Characterization of intrinsically disordered regions in proteins informed by human genetic diversity
title Characterization of intrinsically disordered regions in proteins informed by human genetic diversity
title_full Characterization of intrinsically disordered regions in proteins informed by human genetic diversity
title_fullStr Characterization of intrinsically disordered regions in proteins informed by human genetic diversity
title_full_unstemmed Characterization of intrinsically disordered regions in proteins informed by human genetic diversity
title_short Characterization of intrinsically disordered regions in proteins informed by human genetic diversity
title_sort characterization of intrinsically disordered regions in proteins informed by human genetic diversity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8942211/
https://www.ncbi.nlm.nih.gov/pubmed/35275927
http://dx.doi.org/10.1371/journal.pcbi.1009911
work_keys_str_mv AT ahmedshehabs characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity
AT rifatzaarat characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity
AT lohiaruchi characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity
AT campbellarthurj characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity
AT dunkerakeith characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity
AT rahmanmsohel characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity
AT iqbalsumaiya characterizationofintrinsicallydisorderedregionsinproteinsinformedbyhumangeneticdiversity