Cargando…

Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution

Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expr...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Kuangyu, Yu, Shuhui, Ji, Xiang, Lakner, Clemens, Griffing, Alexander, Thorne, Jeffrey L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415675/
https://www.ncbi.nlm.nih.gov/pubmed/25987828
http://dx.doi.org/10.4137/EBO.S22911
_version_ 1782369109487386624
author Wang, Kuangyu
Yu, Shuhui
Ji, Xiang
Lakner, Clemens
Griffing, Alexander
Thorne, Jeffrey L
author_facet Wang, Kuangyu
Yu, Shuhui
Ji, Xiang
Lakner, Clemens
Griffing, Alexander
Thorne, Jeffrey L
author_sort Wang, Kuangyu
collection PubMed
description Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construct our framework by determining how probable is an amino acid, given RSA and gene expression, and then evaluating the relative probability of observing a codon compared to other synonymous codons. We come to the biologically plausible conclusion that both RSA and gene expression are related to amino acid frequencies, but, among synonymous codons, the relative probability of a particular codon is more closely related to gene expression than RSA. To illustrate the potential applications of our framework, we propose a new codon substitution model. Using this model, we obtain estimates of 2N s, the product of effective population size N, and relative fitness difference of allele s. For a training data set consisting of human proteins with known structures and expression data, 2N s is estimated separately for synonymous and nonsynonymous substitutions in each protein. We then contrast the patterns of synonymous and nonsynonymous 2N s estimates across proteins while also taking gene expression levels of the proteins into account. We conclude that our 2N s estimates are too concentrated around 0, and we discuss potential explanations for this lack of variability.
format Online
Article
Text
id pubmed-4415675
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-44156752015-05-18 Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution Wang, Kuangyu Yu, Shuhui Ji, Xiang Lakner, Clemens Griffing, Alexander Thorne, Jeffrey L Evol Bioinform Online Original Research Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construct our framework by determining how probable is an amino acid, given RSA and gene expression, and then evaluating the relative probability of observing a codon compared to other synonymous codons. We come to the biologically plausible conclusion that both RSA and gene expression are related to amino acid frequencies, but, among synonymous codons, the relative probability of a particular codon is more closely related to gene expression than RSA. To illustrate the potential applications of our framework, we propose a new codon substitution model. Using this model, we obtain estimates of 2N s, the product of effective population size N, and relative fitness difference of allele s. For a training data set consisting of human proteins with known structures and expression data, 2N s is estimated separately for synonymous and nonsynonymous substitutions in each protein. We then contrast the patterns of synonymous and nonsynonymous 2N s estimates across proteins while also taking gene expression levels of the proteins into account. We conclude that our 2N s estimates are too concentrated around 0, and we discuss potential explanations for this lack of variability. Libertas Academica 2015-04-29 /pmc/articles/PMC4415675/ /pubmed/25987828 http://dx.doi.org/10.4137/EBO.S22911 Text en © 2015 the authors, publisher and licensee Libertas Academica Limited This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License
spellingShingle Original Research
Wang, Kuangyu
Yu, Shuhui
Ji, Xiang
Lakner, Clemens
Griffing, Alexander
Thorne, Jeffrey L
Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
title Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
title_full Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
title_fullStr Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
title_full_unstemmed Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
title_short Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
title_sort roles of solvent accessibility and gene expression in modeling protein sequence evolution
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415675/
https://www.ncbi.nlm.nih.gov/pubmed/25987828
http://dx.doi.org/10.4137/EBO.S22911
work_keys_str_mv AT wangkuangyu rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution
AT yushuhui rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution
AT jixiang rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution
AT laknerclemens rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution
AT griffingalexander rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution
AT thornejeffreyl rolesofsolventaccessibilityandgeneexpressioninmodelingproteinsequenceevolution