Cargando…

Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a se...

Descripción completa

Detalles Bibliográficos
Autores principales: Kosakovsky Pond, Sergei, Delport, Wayne, Muse, Spencer V., Scheffler, Konrad
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912764/
https://www.ncbi.nlm.nih.gov/pubmed/20689581
http://dx.doi.org/10.1371/journal.pone.0011230
_version_ 1782184614884802560
author Kosakovsky Pond, Sergei
Delport, Wayne
Muse, Spencer V.
Scheffler, Konrad
author_facet Kosakovsky Pond, Sergei
Delport, Wayne
Muse, Spencer V.
Scheffler, Konrad
author_sort Kosakovsky Pond, Sergei
collection PubMed
description Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard [Image: see text] estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of [Image: see text] sequence alignments, our estimators show a significant improvement in goodness of fit compared to the [Image: see text] approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the [Image: see text]-style estimators.
format Text
id pubmed-2912764
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29127642010-08-04 Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models Kosakovsky Pond, Sergei Delport, Wayne Muse, Spencer V. Scheffler, Konrad PLoS One Research Article Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard [Image: see text] estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of [Image: see text] sequence alignments, our estimators show a significant improvement in goodness of fit compared to the [Image: see text] approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the [Image: see text]-style estimators. Public Library of Science 2010-07-30 /pmc/articles/PMC2912764/ /pubmed/20689581 http://dx.doi.org/10.1371/journal.pone.0011230 Text en Kosakovsky Pond et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kosakovsky Pond, Sergei
Delport, Wayne
Muse, Spencer V.
Scheffler, Konrad
Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
title Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
title_full Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
title_fullStr Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
title_full_unstemmed Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
title_short Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
title_sort correcting the bias of empirical frequency parameter estimators in codon models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2912764/
https://www.ncbi.nlm.nih.gov/pubmed/20689581
http://dx.doi.org/10.1371/journal.pone.0011230
work_keys_str_mv AT kosakovskypondsergei correctingthebiasofempiricalfrequencyparameterestimatorsincodonmodels
AT delportwayne correctingthebiasofempiricalfrequencyparameterestimatorsincodonmodels
AT musespencerv correctingthebiasofempiricalfrequencyparameterestimatorsincodonmodels
AT schefflerkonrad correctingthebiasofempiricalfrequencyparameterestimatorsincodonmodels