Cargando…
Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the g...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875310/ https://www.ncbi.nlm.nih.gov/pubmed/36410474 http://dx.doi.org/10.1016/j.jmb.2022.167892 |
_version_ | 1784877938193203200 |
---|---|
author | Hasenahuer, Marcia A. Sanchis-Juan, Alba Laskowski, Roman A. Baker, James A. Stephenson, James D. Orengo, Christine A. Raymond, F. Lucy Thornton, Janet M. |
author_facet | Hasenahuer, Marcia A. Sanchis-Juan, Alba Laskowski, Roman A. Baker, James A. Stephenson, James D. Orengo, Christine A. Raymond, F. Lucy Thornton, Janet M. |
author_sort | Hasenahuer, Marcia A. |
collection | PubMed |
description | Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects. |
format | Online Article Text |
id | pubmed-9875310 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-98753102023-01-30 Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins Hasenahuer, Marcia A. Sanchis-Juan, Alba Laskowski, Roman A. Baker, James A. Stephenson, James D. Orengo, Christine A. Raymond, F. Lucy Thornton, Janet M. J Mol Biol Research Article Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein–protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder–order transitions upon binding with other protein partners and liquid–liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects. Elsevier 2023-01-30 /pmc/articles/PMC9875310/ /pubmed/36410474 http://dx.doi.org/10.1016/j.jmb.2022.167892 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Hasenahuer, Marcia A. Sanchis-Juan, Alba Laskowski, Roman A. Baker, James A. Stephenson, James D. Orengo, Christine A. Raymond, F. Lucy Thornton, Janet M. Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins |
title | Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins |
title_full | Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins |
title_fullStr | Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins |
title_full_unstemmed | Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins |
title_short | Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins |
title_sort | mapping the constrained coding regions in the human genome to their corresponding proteins |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875310/ https://www.ncbi.nlm.nih.gov/pubmed/36410474 http://dx.doi.org/10.1016/j.jmb.2022.167892 |
work_keys_str_mv | AT hasenahuermarciaa mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT sanchisjuanalba mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT laskowskiromana mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT bakerjamesa mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT stephensonjamesd mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT orengochristinea mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT raymondflucy mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins AT thorntonjanetm mappingtheconstrainedcodingregionsinthehumangenometotheircorrespondingproteins |