Cargando…

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

BACKGROUND: Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA)...

Descripción completa

Detalles Bibliográficos
Autores principales: Lartillot, Nicolas, Brinkmann, Henner, Philippe, Hervé
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1796613/
https://www.ncbi.nlm.nih.gov/pubmed/17288577
http://dx.doi.org/10.1186/1471-2148-7-S1-S4
_version_ 1782132244892090368
author Lartillot, Nicolas
Brinkmann, Henner
Philippe, Hervé
author_facet Lartillot, Nicolas
Brinkmann, Henner
Philippe, Hervé
author_sort Lartillot, Nicolas
collection PubMed
description BACKGROUND: Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions. METHODS: We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation. RESULTS: Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences. CONCLUSION: The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.
format Text
id pubmed-1796613
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17966132007-02-09 Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model Lartillot, Nicolas Brinkmann, Henner Philippe, Hervé BMC Evol Biol Research BACKGROUND: Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions. METHODS: We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation. RESULTS: Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences. CONCLUSION: The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees. BioMed Central 2007-02-08 /pmc/articles/PMC1796613/ /pubmed/17288577 http://dx.doi.org/10.1186/1471-2148-7-S1-S4 Text en Copyright © 2007 Lartillot et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Lartillot, Nicolas
Brinkmann, Henner
Philippe, Hervé
Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
title Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
title_full Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
title_fullStr Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
title_full_unstemmed Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
title_short Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
title_sort suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1796613/
https://www.ncbi.nlm.nih.gov/pubmed/17288577
http://dx.doi.org/10.1186/1471-2148-7-S1-S4
work_keys_str_mv AT lartillotnicolas suppressionoflongbranchattractionartefactsintheanimalphylogenyusingasiteheterogeneousmodel
AT brinkmannhenner suppressionoflongbranchattractionartefactsintheanimalphylogenyusingasiteheterogeneousmodel
AT philippeherve suppressionoflongbranchattractionartefactsintheanimalphylogenyusingasiteheterogeneousmodel