Cargando…

Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)

Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias....

Descripción completa

Detalles Bibliográficos
Autores principales: Gilchrist, Michael A., Chen, Wei-Chen, Shah, Premal, Landerer, Cedric L., Zaretzki, Russell
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494061/
https://www.ncbi.nlm.nih.gov/pubmed/25977456
http://dx.doi.org/10.1093/gbe/evv087
_version_ 1782380023865409536
author Gilchrist, Michael A.
Chen, Wei-Chen
Shah, Premal
Landerer, Cedric L.
Zaretzki, Russell
author_facet Gilchrist, Michael A.
Chen, Wei-Chen
Shah, Premal
Landerer, Cedric L.
Zaretzki, Russell
author_sort Gilchrist, Michael A.
collection PubMed
description Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid’s “optimal” codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.
format Online
Article
Text
id pubmed-4494061
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44940612015-10-26 Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡) Gilchrist, Michael A. Chen, Wei-Chen Shah, Premal Landerer, Cedric L. Zaretzki, Russell Genome Biol Evol Research Article Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid’s “optimal” codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models. Oxford University Press 2015-05-14 /pmc/articles/PMC4494061/ /pubmed/25977456 http://dx.doi.org/10.1093/gbe/evv087 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Gilchrist, Michael A.
Chen, Wei-Chen
Shah, Premal
Landerer, Cedric L.
Zaretzki, Russell
Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)
title Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)
title_full Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)
title_fullStr Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)
title_full_unstemmed Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)
title_short Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone(‡)
title_sort estimating gene expression and codon-specific translational efficiencies, mutation biases, and selection coefficients from genomic data alone(‡)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494061/
https://www.ncbi.nlm.nih.gov/pubmed/25977456
http://dx.doi.org/10.1093/gbe/evv087
work_keys_str_mv AT gilchristmichaela estimatinggeneexpressionandcodonspecifictranslationalefficienciesmutationbiasesandselectioncoefficientsfromgenomicdataalone
AT chenweichen estimatinggeneexpressionandcodonspecifictranslationalefficienciesmutationbiasesandselectioncoefficientsfromgenomicdataalone
AT shahpremal estimatinggeneexpressionandcodonspecifictranslationalefficienciesmutationbiasesandselectioncoefficientsfromgenomicdataalone
AT landerercedricl estimatinggeneexpressionandcodonspecifictranslationalefficienciesmutationbiasesandselectioncoefficientsfromgenomicdataalone
AT zaretzkirussell estimatinggeneexpressionandcodonspecifictranslationalefficienciesmutationbiasesandselectioncoefficientsfromgenomicdataalone