Cargando…

Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

BACKGROUND: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and...

Descripción completa

Detalles Bibliográficos
Autores principales: Doerr, Daniel, Gronau, Ilan, Moran, Shlomo, Yavneh, Irad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538584/
https://www.ncbi.nlm.nih.gov/pubmed/22938153
http://dx.doi.org/10.1186/1748-7188-7-22
_version_ 1782254968997150720
author Doerr, Daniel
Gronau, Ilan
Moran, Shlomo
Yavneh, Irad
author_facet Doerr, Daniel
Gronau, Ilan
Moran, Shlomo
Yavneh, Irad
author_sort Doerr, Daniel
collection PubMed
description BACKGROUND: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. RESULTS: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura’s two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. CONCLUSIONS: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.
format Online
Article
Text
id pubmed-3538584
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35385842013-01-10 Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions Doerr, Daniel Gronau, Ilan Moran, Shlomo Yavneh, Irad Algorithms Mol Biol Research BACKGROUND: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. RESULTS: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura’s two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. CONCLUSIONS: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods. BioMed Central 2012-08-31 /pmc/articles/PMC3538584/ /pubmed/22938153 http://dx.doi.org/10.1186/1748-7188-7-22 Text en Copyright ©2012 Doerr et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Doerr, Daniel
Gronau, Ilan
Moran, Shlomo
Yavneh, Irad
Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
title Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
title_full Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
title_fullStr Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
title_full_unstemmed Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
title_short Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
title_sort stochastic errors vs. modeling errors in distance based phylogenetic reconstructions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538584/
https://www.ncbi.nlm.nih.gov/pubmed/22938153
http://dx.doi.org/10.1186/1748-7188-7-22
work_keys_str_mv AT doerrdaniel stochasticerrorsvsmodelingerrorsindistancebasedphylogeneticreconstructions
AT gronauilan stochasticerrorsvsmodelingerrorsindistancebasedphylogeneticreconstructions
AT moranshlomo stochasticerrorsvsmodelingerrorsindistancebasedphylogeneticreconstructions
AT yavnehirad stochasticerrorsvsmodelingerrorsindistancebasedphylogeneticreconstructions