Cargando…

Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects

BACKGROUND: Published molecular phylogenies are usually based on data whose quality has not been explored prior to tree inference. This leads to errors because trees obtained with conventional methods suppress conflicting evidence, and because support values may be high even if there is no distinct...

Descripción completa

Detalles Bibliográficos
Autores principales: Wägele, Johann Wolfgang, Mayer, Christoph
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040160/
https://www.ncbi.nlm.nih.gov/pubmed/17725833
http://dx.doi.org/10.1186/1471-2148-7-147
_version_ 1782137075229786112
author Wägele, Johann Wolfgang
Mayer, Christoph
author_facet Wägele, Johann Wolfgang
Mayer, Christoph
author_sort Wägele, Johann Wolfgang
collection PubMed
description BACKGROUND: Published molecular phylogenies are usually based on data whose quality has not been explored prior to tree inference. This leads to errors because trees obtained with conventional methods suppress conflicting evidence, and because support values may be high even if there is no distinct phylogenetic signal. Tools that allow an a priori examination of data quality are rarely applied. RESULTS: Using data from published molecular analyses on the phylogeny of crustaceans it is shown that tree topologies and popular support values do not show existing differences in data quality. To visualize variations in signal distinctness, we use network analyses based on split decomposition and split support spectra. Both methods show the same differences in data quality and the same clade-supporting patterns. Both methods are useful to discover long-branch effects. We discern three classes of long branch effects. Class I effects consist of attraction of terminal taxa caused by symplesiomorphies, which results in a false monophyly of paraphyletic groups. Addition of carefully selected taxa can fix this effect. Class II effects are caused by drastic signal erosion. Long branches affected by this phenomenon usually slip down the tree to form false clades that in reality are polyphyletic. To recover the correct phylogeny, more conservative genes must be used. Class III effects consist of attraction due to accumulated chance similarities or convergent character states. This sort of noise can be reduced by selecting less variable portions of the data set, avoiding biases, and adding slower genes. CONCLUSION: To increase confidence in molecular phylogenies an exploratory analysis of the signal to noise ratio can be conducted with split decomposition methods. If long-branch effects are detected, it is necessary to discern between three classes of effects to find the best approach for an improvement of the raw data.
format Text
id pubmed-2040160
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20401602007-10-23 Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects Wägele, Johann Wolfgang Mayer, Christoph BMC Evol Biol Research Article BACKGROUND: Published molecular phylogenies are usually based on data whose quality has not been explored prior to tree inference. This leads to errors because trees obtained with conventional methods suppress conflicting evidence, and because support values may be high even if there is no distinct phylogenetic signal. Tools that allow an a priori examination of data quality are rarely applied. RESULTS: Using data from published molecular analyses on the phylogeny of crustaceans it is shown that tree topologies and popular support values do not show existing differences in data quality. To visualize variations in signal distinctness, we use network analyses based on split decomposition and split support spectra. Both methods show the same differences in data quality and the same clade-supporting patterns. Both methods are useful to discover long-branch effects. We discern three classes of long branch effects. Class I effects consist of attraction of terminal taxa caused by symplesiomorphies, which results in a false monophyly of paraphyletic groups. Addition of carefully selected taxa can fix this effect. Class II effects are caused by drastic signal erosion. Long branches affected by this phenomenon usually slip down the tree to form false clades that in reality are polyphyletic. To recover the correct phylogeny, more conservative genes must be used. Class III effects consist of attraction due to accumulated chance similarities or convergent character states. This sort of noise can be reduced by selecting less variable portions of the data set, avoiding biases, and adding slower genes. CONCLUSION: To increase confidence in molecular phylogenies an exploratory analysis of the signal to noise ratio can be conducted with split decomposition methods. If long-branch effects are detected, it is necessary to discern between three classes of effects to find the best approach for an improvement of the raw data. BioMed Central 2007-08-28 /pmc/articles/PMC2040160/ /pubmed/17725833 http://dx.doi.org/10.1186/1471-2148-7-147 Text en Copyright © 2007 Wägele and Mayer; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wägele, Johann Wolfgang
Mayer, Christoph
Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
title Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
title_full Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
title_fullStr Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
title_full_unstemmed Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
title_short Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
title_sort visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040160/
https://www.ncbi.nlm.nih.gov/pubmed/17725833
http://dx.doi.org/10.1186/1471-2148-7-147
work_keys_str_mv AT wagelejohannwolfgang visualizingdifferencesinphylogeneticinformationcontentofalignmentsanddistinctionofthreeclassesoflongbrancheffects
AT mayerchristoph visualizingdifferencesinphylogeneticinformationcontentofalignmentsanddistinctionofthreeclassesoflongbrancheffects