Cargando…
Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expr...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415675/ https://www.ncbi.nlm.nih.gov/pubmed/25987828 http://dx.doi.org/10.4137/EBO.S22911 |
_version_ | 1782369109487386624 |
---|---|
author | Wang, Kuangyu Yu, Shuhui Ji, Xiang Lakner, Clemens Griffing, Alexander Thorne, Jeffrey L |
author_facet | Wang, Kuangyu Yu, Shuhui Ji, Xiang Lakner, Clemens Griffing, Alexander Thorne, Jeffrey L |
author_sort | Wang, Kuangyu |
collection | PubMed |
description | Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construct our framework by determining how probable is an amino acid, given RSA and gene expression, and then evaluating the relative probability of observing a codon compared to other synonymous codons. We come to the biologically plausible conclusion that both RSA and gene expression are related to amino acid frequencies, but, among synonymous codons, the relative probability of a particular codon is more closely related to gene expression than RSA. To illustrate the potential applications of our framework, we propose a new codon substitution model. Using this model, we obtain estimates of 2N s, the product of effective population size N, and relative fitness difference of allele s. For a training data set consisting of human proteins with known structures and expression data, 2N s is estimated separately for synonymous and nonsynonymous substitutions in each protein. We then contrast the patterns of synonymous and nonsynonymous 2N s estimates across proteins while also taking gene expression levels of the proteins into account. We conclude that our 2N s estimates are too concentrated around 0, and we discuss potential explanations for this lack of variability. |
format | Online Article Text |
id | pubmed-4415675 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-44156752015-05-18 Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution Wang, Kuangyu Yu, Shuhui Ji, Xiang Lakner, Clemens Griffing, Alexander Thorne, Jeffrey L Evol Bioinform Online Original Research Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construct our framework by determining how probable is an amino acid, given RSA and gene expression, and then evaluating the relative probability of observing a codon compared to other synonymous codons. We come to the biologically plausible conclusion that both RSA and gene expression are related to amino acid frequencies, but, among synonymous codons, the relative probability of a particular codon is more closely related to gene expression than RSA. To illustrate the potential applications of our framework, we propose a new codon substitution model. Using this model, we obtain estimates of 2N s, the product of effective population size N, and relative fitness difference of allele s. For a training data set consisting of human proteins with known structures and expression data, 2N s is estimated separately for synonymous and nonsynonymous substitutions in each protein. We then contrast the patterns of synonymous and nonsynonymous 2N s estimates across proteins while also taking gene expression levels of the proteins into account. We conclude that our 2N s estimates are too concentrated around 0, and we discuss potential explanations for this lack of variability. Libertas Academica 2015-04-29 /pmc/articles/PMC4415675/ /pubmed/25987828 http://dx.doi.org/10.4137/EBO.S22911 Text en © 2015 the authors, publisher and licensee Libertas Academica Limited This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License |
spellingShingle | Original Research Wang, Kuangyu Yu, Shuhui Ji, Xiang Lakner, Clemens Griffing, Alexander Thorne, Jeffrey L Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution |
title | Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution |
title_full | Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution |
title_fullStr | Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution |
title_full_unstemmed | Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution |
title_short | Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution |
title_sort | roles of solvent accessibility and gene expression in modeling protein sequence evolution |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415675/ https://www.ncbi.nlm.nih.gov/pubmed/25987828 http://dx.doi.org/10.4137/EBO.S22911 |
work_keys_str_mv | AT wangkuangyu rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution AT yushuhui rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution AT jixiang rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution AT laknerclemens rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution AT griffingalexander rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution AT thornejeffreyl rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution |