Cargando…

The Prevalence and Impact of Model Violations in Phylogenetic Analysis

In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies an...

Descripción completa

Detalles Bibliográficos
Autores principales: Naser-Khdour, Suha, Minh, Bui Quang, Zhang, Wenqi, Stone, Eric A, Lanfear, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6893154/
https://www.ncbi.nlm.nih.gov/pubmed/31536115
http://dx.doi.org/10.1093/gbe/evz193
_version_ 1783476149982068736
author Naser-Khdour, Suha
Minh, Bui Quang
Zhang, Wenqi
Stone, Eric A
Lanfear, Robert
author_facet Naser-Khdour, Suha
Minh, Bui Quang
Zhang, Wenqi
Stone, Eric A
Lanfear, Robert
author_sort Naser-Khdour, Suha
collection PubMed
description In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).
format Online
Article
Text
id pubmed-6893154
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68931542019-12-10 The Prevalence and Impact of Model Violations in Phylogenetic Analysis Naser-Khdour, Suha Minh, Bui Quang Zhang, Wenqi Stone, Eric A Lanfear, Robert Genome Biol Evol Research Article In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org). Oxford University Press 2019-09-19 /pmc/articles/PMC6893154/ /pubmed/31536115 http://dx.doi.org/10.1093/gbe/evz193 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Naser-Khdour, Suha
Minh, Bui Quang
Zhang, Wenqi
Stone, Eric A
Lanfear, Robert
The Prevalence and Impact of Model Violations in Phylogenetic Analysis
title The Prevalence and Impact of Model Violations in Phylogenetic Analysis
title_full The Prevalence and Impact of Model Violations in Phylogenetic Analysis
title_fullStr The Prevalence and Impact of Model Violations in Phylogenetic Analysis
title_full_unstemmed The Prevalence and Impact of Model Violations in Phylogenetic Analysis
title_short The Prevalence and Impact of Model Violations in Phylogenetic Analysis
title_sort prevalence and impact of model violations in phylogenetic analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6893154/
https://www.ncbi.nlm.nih.gov/pubmed/31536115
http://dx.doi.org/10.1093/gbe/evz193
work_keys_str_mv AT naserkhdoursuha theprevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT minhbuiquang theprevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT zhangwenqi theprevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT stoneerica theprevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT lanfearrobert theprevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT naserkhdoursuha prevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT minhbuiquang prevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT zhangwenqi prevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT stoneerica prevalenceandimpactofmodelviolationsinphylogeneticanalysis
AT lanfearrobert prevalenceandimpactofmodelviolationsinphylogeneticanalysis