Cargando…

Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins

Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the g...

Descripción completa

Detalles Bibliográficos
Autores principales: Hasenahuer, Marcia A., Sanchis-Juan, Alba, Laskowski, Roman A., Baker, James A., Stephenson, James D., Orengo, Christine A., Raymond, F. Lucy, Thornton, Janet M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875310/
https://www.ncbi.nlm.nih.gov/pubmed/36410474
http://dx.doi.org/10.1016/j.jmb.2022.167892
_version_ 1784877938193203200
author Hasenahuer, Marcia A.
Sanchis-Juan, Alba
Laskowski, Roman A.
Baker, James A.
Stephenson, James D.
Orengo, Christine A.
Raymond, F. Lucy
Thornton, Janet M.
author_facet Hasenahuer, Marcia A.
Sanchis-Juan, Alba
Laskowski, Roman A.
Baker, James A.
Stephenson, James D.
Orengo, Christine A.
Raymond, F. Lucy
Thornton, Janet M.
author_sort Hasenahuer, Marcia A.
collection PubMed
description Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
format Online
Article
Text
id pubmed-9875310
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-98753102023-01-30 Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins Hasenahuer, Marcia A. Sanchis-Juan, Alba Laskowski, Roman A. Baker, James A. Stephenson, James D. Orengo, Christine A. Raymond, F. Lucy Thornton, Janet M. J Mol Biol Research Article Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects. Elsevier 2023-01-30 /pmc/articles/PMC9875310/ /pubmed/36410474 http://dx.doi.org/10.1016/j.jmb.2022.167892 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Hasenahuer, Marcia A.
Sanchis-Juan, Alba
Laskowski, Roman A.
Baker, James A.
Stephenson, James D.
Orengo, Christine A.
Raymond, F. Lucy
Thornton, Janet M.
Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
title Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
title_full Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
title_fullStr Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
title_full_unstemmed Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
title_short Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
title_sort mapping the constrained coding regions in the human genome to their corresponding proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875310/
https://www.ncbi.nlm.nih.gov/pubmed/36410474
http://dx.doi.org/10.1016/j.jmb.2022.167892
work_keys_str_mv AT hasenahuermarciaa mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT sanchisjuanalba mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT laskowskiromana mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT bakerjamesa mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT stephensonjamesd mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT orengochristinea mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT raymondflucy mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins
AT thorntonjanetm mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins