Cargando…

Accurate and fast methods to estimate the population mutation rate from error prone sequences

BACKGROUND: The population mutation rate (θ) remains one of the most fundamental parameters in genetics, ecology, and evolutionary biology. However, its accurate estimation can be seriously compromised when working with error prone data such as expressed sequence tags, low coverage draft sequences,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Knudsen, Bjarne, Miyamoto, Michael M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Methodology article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746815/ https://www.ncbi.nlm.nih.gov/pubmed/19671163 http://dx.doi.org/10.1186/1471-2105-10-247

_version_	1782172061022552064
author	Knudsen, Bjarne Miyamoto, Michael M
author_facet	Knudsen, Bjarne Miyamoto, Michael M
author_sort	Knudsen, Bjarne
collection	PubMed
description	BACKGROUND: The population mutation rate (θ) remains one of the most fundamental parameters in genetics, ecology, and evolutionary biology. However, its accurate estimation can be seriously compromised when working with error prone data such as expressed sequence tags, low coverage draft sequences, and other such unfinished products. This study is premised on the simple idea that a random sequence error due to a chance accident during data collection or recording will be distributed within a population dataset as a singleton (i.e., as a polymorphic site where one sampled sequence exhibits a unique base relative to the common nucleotide of the others). Thus, one can avoid these random errors by ignoring the singletons within a dataset. RESULTS: This strategy is implemented under an infinite sites model that focuses on only the internal branches of the sample genealogy where a shared polymorphism can arise (i.e., a variable site where each alternative base is represented by at least two sequences). This approach is first used to derive independently the same new Watterson and Tajima estimators of θ, as recently reported by Achaz [1] for error prone sequences. It is then used to modify the recent, full, maximum-likelihood model of Knudsen and Miyamoto [2], which incorporates various factors for experimental error and design with those for coalescence and mutation. These new methods are all accurate and fast according to evolutionary simulations and analyses of a real complex population dataset for the California seahare. CONCLUSION: In light of these results, we recommend the use of these three new methods for the determination of θ from error prone sequences. In particular, we advocate the new maximum likelihood model as a starting point for the further development of more complex coalescent/mutation models that also account for experimental error and design.
format	Text
id	pubmed-2746815
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27468152009-09-19 Accurate and fast methods to estimate the population mutation rate from error prone sequences Knudsen, Bjarne Miyamoto, Michael M BMC Bioinformatics Methodology article BACKGROUND: The population mutation rate (θ) remains one of the most fundamental parameters in genetics, ecology, and evolutionary biology. However, its accurate estimation can be seriously compromised when working with error prone data such as expressed sequence tags, low coverage draft sequences, and other such unfinished products. This study is premised on the simple idea that a random sequence error due to a chance accident during data collection or recording will be distributed within a population dataset as a singleton (i.e., as a polymorphic site where one sampled sequence exhibits a unique base relative to the common nucleotide of the others). Thus, one can avoid these random errors by ignoring the singletons within a dataset. RESULTS: This strategy is implemented under an infinite sites model that focuses on only the internal branches of the sample genealogy where a shared polymorphism can arise (i.e., a variable site where each alternative base is represented by at least two sequences). This approach is first used to derive independently the same new Watterson and Tajima estimators of θ, as recently reported by Achaz [1] for error prone sequences. It is then used to modify the recent, full, maximum-likelihood model of Knudsen and Miyamoto [2], which incorporates various factors for experimental error and design with those for coalescence and mutation. These new methods are all accurate and fast according to evolutionary simulations and analyses of a real complex population dataset for the California seahare. CONCLUSION: In light of these results, we recommend the use of these three new methods for the determination of θ from error prone sequences. In particular, we advocate the new maximum likelihood model as a starting point for the further development of more complex coalescent/mutation models that also account for experimental error and design. BioMed Central 2009-08-11 /pmc/articles/PMC2746815/ /pubmed/19671163 http://dx.doi.org/10.1186/1471-2105-10-247 Text en Copyright ©2009 Knudsen and Miyamoto; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology article Knudsen, Bjarne Miyamoto, Michael M Accurate and fast methods to estimate the population mutation rate from error prone sequences
title	Accurate and fast methods to estimate the population mutation rate from error prone sequences
title_full	Accurate and fast methods to estimate the population mutation rate from error prone sequences
title_fullStr	Accurate and fast methods to estimate the population mutation rate from error prone sequences
title_full_unstemmed	Accurate and fast methods to estimate the population mutation rate from error prone sequences
title_short	Accurate and fast methods to estimate the population mutation rate from error prone sequences
title_sort	accurate and fast methods to estimate the population mutation rate from error prone sequences
topic	Methodology article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746815/ https://www.ncbi.nlm.nih.gov/pubmed/19671163 http://dx.doi.org/10.1186/1471-2105-10-247
work_keys_str_mv	AT knudsenbjarne accurateandfastmethodstoestimatethepopulationmutationratefromerrorpronesequences AT miyamotomichaelm accurateandfastmethodstoestimatethepopulationmutationratefromerrorpronesequences

Accurate and fast methods to estimate the population mutation rate from error prone sequences

Ejemplares similares