Cargando…

The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes

The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in...

Descripción completa

Detalles Bibliográficos
Autores principales: Evans, Patrick, Cox, Nancy J., Gamazon, Eric R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7380284/
https://www.ncbi.nlm.nih.gov/pubmed/32765967
http://dx.doi.org/10.7717/peerj.9554
_version_ 1783562822837338112
author Evans, Patrick
Cox, Nancy J.
Gamazon, Eric R.
author_facet Evans, Patrick
Cox, Nancy J.
Gamazon, Eric R.
author_sort Evans, Patrick
collection PubMed
description The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann–Whitney U p = 1.4 × 10(−4)). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10(−284)) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism.
format Online
Article
Text
id pubmed-7380284
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-73802842020-08-05 The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes Evans, Patrick Cox, Nancy J. Gamazon, Eric R. PeerJ Bioinformatics The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann–Whitney U p = 1.4 × 10(−4)). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10(−284)) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism. PeerJ Inc. 2020-07-21 /pmc/articles/PMC7380284/ /pubmed/32765967 http://dx.doi.org/10.7717/peerj.9554 Text en ©2020 Evans et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Evans, Patrick
Cox, Nancy J.
Gamazon, Eric R.
The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
title The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
title_full The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
title_fullStr The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
title_full_unstemmed The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
title_short The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
title_sort regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7380284/
https://www.ncbi.nlm.nih.gov/pubmed/32765967
http://dx.doi.org/10.7717/peerj.9554
work_keys_str_mv AT evanspatrick theregulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes
AT coxnancyj theregulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes
AT gamazonericr theregulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes
AT evanspatrick regulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes
AT coxnancyj regulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes
AT gamazonericr regulatorygenomeconstrainsproteinsequenceevolutionimplicationsforthesearchfordiseaseassociatedgenes