Cargando…

Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent

Background Most statistical methods for phylogenetic estimation in use today treat a gap (generally representing an insertion or deletion, i.e., indel) within the input sequence alignment as missing data. However, the statistical properties of this treatment of indels have not been fully investigate...

Descripción completa

Detalles Bibliográficos
Autor principal: Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3299439/
https://www.ncbi.nlm.nih.gov/pubmed/22453901
http://dx.doi.org/10.1371/currents.RRN1308
_version_ 1782226117701140480
author Warnow, Tandy
author_facet Warnow, Tandy
author_sort Warnow, Tandy
collection PubMed
description Background Most statistical methods for phylogenetic estimation in use today treat a gap (generally representing an insertion or deletion, i.e., indel) within the input sequence alignment as missing data. However, the statistical properties of this treatment of indels have not been fully investigated. Results We prove that maximum likelihood phylogeny estimation, treating indels as missing data, can be statistically inconsistent for a general (and rather simple) model of sequence evolution, even when given the true alignment. Therefore, accurate phylogeny estimation cannot be guaranteed for maximum likelihood analyses, even given arbitrarily long sequences, when indels are present and treated as missing data. Conclusions Our result shows that the standard statistical techniques used to estimate phylogenies from sequence alignments may have unfavorable statistical properties, even when the sequence alignment is accurate and the assumed substitution model matches the generation model. This suggests that the recent research focus on developing statistical methods that treat indel events properly is an important direction for phylogeny estimation.
format Online
Article
Text
id pubmed-3299439
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32994392012-03-14 Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent Warnow, Tandy PLoS Curr Tree of Life Background Most statistical methods for phylogenetic estimation in use today treat a gap (generally representing an insertion or deletion, i.e., indel) within the input sequence alignment as missing data. However, the statistical properties of this treatment of indels have not been fully investigated. Results We prove that maximum likelihood phylogeny estimation, treating indels as missing data, can be statistically inconsistent for a general (and rather simple) model of sequence evolution, even when given the true alignment. Therefore, accurate phylogeny estimation cannot be guaranteed for maximum likelihood analyses, even given arbitrarily long sequences, when indels are present and treated as missing data. Conclusions Our result shows that the standard statistical techniques used to estimate phylogenies from sequence alignments may have unfavorable statistical properties, even when the sequence alignment is accurate and the assumed substitution model matches the generation model. This suggests that the recent research focus on developing statistical methods that treat indel events properly is an important direction for phylogeny estimation. Public Library of Science 2012-03-13 /pmc/articles/PMC3299439/ /pubmed/22453901 http://dx.doi.org/10.1371/currents.RRN1308 Text en http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Tree of Life
Warnow, Tandy
Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
title Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
title_full Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
title_fullStr Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
title_full_unstemmed Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
title_short Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
title_sort standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent
topic Tree of Life
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3299439/
https://www.ncbi.nlm.nih.gov/pubmed/22453901
http://dx.doi.org/10.1371/currents.RRN1308
work_keys_str_mv AT warnowtandy standardmaximumlikelihoodanalysesofalignmentswithgapscanbestatisticallyinconsistent