Cargando…
The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7380284/ https://www.ncbi.nlm.nih.gov/pubmed/32765967 http://dx.doi.org/10.7717/peerj.9554 |
_version_ | 1783562822837338112 |
---|---|
author | Evans, Patrick Cox, Nancy J. Gamazon, Eric R. |
author_facet | Evans, Patrick Cox, Nancy J. Gamazon, Eric R. |
author_sort | Evans, Patrick |
collection | PubMed |
description | The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann–Whitney U p = 1.4 × 10(−4)). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10(−284)) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism. |
format | Online Article Text |
id | pubmed-7380284 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73802842020-08-05 The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes Evans, Patrick Cox, Nancy J. Gamazon, Eric R. PeerJ Bioinformatics The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann–Whitney U p = 1.4 × 10(−4)). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10(−284)) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism. PeerJ Inc. 2020-07-21 /pmc/articles/PMC7380284/ /pubmed/32765967 http://dx.doi.org/10.7717/peerj.9554 Text en ©2020 Evans et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Evans, Patrick Cox, Nancy J. Gamazon, Eric R. The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
title | The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
title_full | The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
title_fullStr | The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
title_full_unstemmed | The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
title_short | The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
title_sort | regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7380284/ https://www.ncbi.nlm.nih.gov/pubmed/32765967 http://dx.doi.org/10.7717/peerj.9554 |
work_keys_str_mv | AT evanspatrick theregulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes AT coxnancyj theregulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes AT gamazonericr theregulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes AT evanspatrick regulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes AT coxnancyj regulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes AT gamazonericr regulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes |