Cargando…

Estimating Empirical Codon Hidden Markov Models

Empirical codon models (ECMs) estimated from a large number of globular protein families outperformed mechanistic codon models in their description of the general process of protein evolution. Among other factors, ECMs implicitly model the influence of amino acid properties and multiple nucleotide s...

Descripción completa

Detalles Bibliográficos
Autores principales:	De Maio, Nicola, Holmes, Ian, Schlötterer, Christian, Kosiol, Carolin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2013
Materias:	Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563974/ https://www.ncbi.nlm.nih.gov/pubmed/23188590 http://dx.doi.org/10.1093/molbev/mss266

_version_	1782258252873990144
author	De Maio, Nicola Holmes, Ian Schlötterer, Christian Kosiol, Carolin
author_facet	De Maio, Nicola Holmes, Ian Schlötterer, Christian Kosiol, Carolin
author_sort	De Maio, Nicola
collection	PubMed
description	Empirical codon models (ECMs) estimated from a large number of globular protein families outperformed mechanistic codon models in their description of the general process of protein evolution. Among other factors, ECMs implicitly model the influence of amino acid properties and multiple nucleotide substitutions (MNS). However, the estimation of ECMs requires large quantities of data, and until recently, only few suitable data sets were available. Here, we take advantage of several new Drosophila species genomes to estimate codon models from genome-wide data. The availability of large numbers of genomes over varying phylogenetic depths in the Drosophila genus allows us to explore various divergence levels. In consequence, we can use these data to determine the appropriate level of divergence for the estimation of ECMs, avoiding overestimation of MNS rates caused by saturation. To account for variation in evolutionary rates along the genome, we develop new empirical codon hidden Markov models (ecHMMs). These models significantly outperform previous ones with respect to maximum likelihood values, suggesting that they provide a better fit to the evolutionary process. Using ECMs and ecHMMs derived from genome-wide data sets, we devise new likelihood ratio tests (LRTs) of positive selection. We found classical LRTs very sensitive to the presence of MNSs, showing high false-positive rates, especially with small phylogenies. The new LRTs are more conservative than the classical ones, having acceptable false-positive rates and reduced power.
format	Online Article Text
id	pubmed-3563974
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-35639742013-02-05 Estimating Empirical Codon Hidden Markov Models De Maio, Nicola Holmes, Ian Schlötterer, Christian Kosiol, Carolin Mol Biol Evol Methods Empirical codon models (ECMs) estimated from a large number of globular protein families outperformed mechanistic codon models in their description of the general process of protein evolution. Among other factors, ECMs implicitly model the influence of amino acid properties and multiple nucleotide substitutions (MNS). However, the estimation of ECMs requires large quantities of data, and until recently, only few suitable data sets were available. Here, we take advantage of several new Drosophila species genomes to estimate codon models from genome-wide data. The availability of large numbers of genomes over varying phylogenetic depths in the Drosophila genus allows us to explore various divergence levels. In consequence, we can use these data to determine the appropriate level of divergence for the estimation of ECMs, avoiding overestimation of MNS rates caused by saturation. To account for variation in evolutionary rates along the genome, we develop new empirical codon hidden Markov models (ecHMMs). These models significantly outperform previous ones with respect to maximum likelihood values, suggesting that they provide a better fit to the evolutionary process. Using ECMs and ecHMMs derived from genome-wide data sets, we devise new likelihood ratio tests (LRTs) of positive selection. We found classical LRTs very sensitive to the presence of MNSs, showing high false-positive rates, especially with small phylogenies. The new LRTs are more conservative than the classical ones, having acceptable false-positive rates and reduced power. Oxford University Press 2013-03 2012-11-27 /pmc/articles/PMC3563974/ /pubmed/23188590 http://dx.doi.org/10.1093/molbev/mss266 Text en © The Author(s) 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods De Maio, Nicola Holmes, Ian Schlötterer, Christian Kosiol, Carolin Estimating Empirical Codon Hidden Markov Models
title	Estimating Empirical Codon Hidden Markov Models
title_full	Estimating Empirical Codon Hidden Markov Models
title_fullStr	Estimating Empirical Codon Hidden Markov Models
title_full_unstemmed	Estimating Empirical Codon Hidden Markov Models
title_short	Estimating Empirical Codon Hidden Markov Models
title_sort	estimating empirical codon hidden markov models
topic	Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563974/ https://www.ncbi.nlm.nih.gov/pubmed/23188590 http://dx.doi.org/10.1093/molbev/mss266
work_keys_str_mv	AT demaionicola estimatingempiricalcodonhiddenmarkovmodels AT holmesian estimatingempiricalcodonhiddenmarkovmodels AT schlottererchristian estimatingempiricalcodonhiddenmarkovmodels AT kosiolcarolin estimatingempiricalcodonhiddenmarkovmodels

Estimating Empirical Codon Hidden Markov Models

Ejemplares similares