Cargando…

Phylogenetic inference under varying proportions of indel-induced alignment gaps

BACKGROUND: The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect i...

Descripción completa

Detalles Bibliográficos
Autores principales: Dwivedi, Bhakti, Gadagkar, Sudhindra R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746219/
https://www.ncbi.nlm.nih.gov/pubmed/19698168
http://dx.doi.org/10.1186/1471-2148-9-211
_version_ 1782172025448562688
author Dwivedi, Bhakti
Gadagkar, Sudhindra R
author_facet Dwivedi, Bhakti
Gadagkar, Sudhindra R
author_sort Dwivedi, Bhakti
collection PubMed
description BACKGROUND: The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. RESULTS: (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. CONCLUSION: When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy.
format Text
id pubmed-2746219
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27462192009-09-18 Phylogenetic inference under varying proportions of indel-induced alignment gaps Dwivedi, Bhakti Gadagkar, Sudhindra R BMC Evol Biol Research Article BACKGROUND: The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. RESULTS: (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. CONCLUSION: When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy. BioMed Central 2009-08-23 /pmc/articles/PMC2746219/ /pubmed/19698168 http://dx.doi.org/10.1186/1471-2148-9-211 Text en Copyright © 2009 Dwivedi and Gadagkar; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dwivedi, Bhakti
Gadagkar, Sudhindra R
Phylogenetic inference under varying proportions of indel-induced alignment gaps
title Phylogenetic inference under varying proportions of indel-induced alignment gaps
title_full Phylogenetic inference under varying proportions of indel-induced alignment gaps
title_fullStr Phylogenetic inference under varying proportions of indel-induced alignment gaps
title_full_unstemmed Phylogenetic inference under varying proportions of indel-induced alignment gaps
title_short Phylogenetic inference under varying proportions of indel-induced alignment gaps
title_sort phylogenetic inference under varying proportions of indel-induced alignment gaps
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746219/
https://www.ncbi.nlm.nih.gov/pubmed/19698168
http://dx.doi.org/10.1186/1471-2148-9-211
work_keys_str_mv AT dwivedibhakti phylogeneticinferenceundervaryingproportionsofindelinducedalignmentgaps
AT gadagkarsudhindrar phylogeneticinferenceundervaryingproportionsofindelinducedalignmentgaps