Cargando…

A comparative study of SVDquartets and other coalescent-based species tree estimation methods

BACKGROUND: Species tree estimation is challenging in the presence of incomplete lineage sorting (ILS), which can make gene trees different from the species tree. Because ILS is expected to occur and the standard concatenation approach can return incorrect trees with high support in the presence of...

Descripción completa

Detalles Bibliográficos
Autores principales: Chou, Jed, Gupta, Ashu, Yaduvanshi, Shashank, Davidson, Ruth, Nute, Mike, Mirarab, Siavash, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4602346/
https://www.ncbi.nlm.nih.gov/pubmed/26449249
http://dx.doi.org/10.1186/1471-2164-16-S10-S2
_version_ 1782394703947235328
author Chou, Jed
Gupta, Ashu
Yaduvanshi, Shashank
Davidson, Ruth
Nute, Mike
Mirarab, Siavash
Warnow, Tandy
author_facet Chou, Jed
Gupta, Ashu
Yaduvanshi, Shashank
Davidson, Ruth
Nute, Mike
Mirarab, Siavash
Warnow, Tandy
author_sort Chou, Jed
collection PubMed
description BACKGROUND: Species tree estimation is challenging in the presence of incomplete lineage sorting (ILS), which can make gene trees different from the species tree. Because ILS is expected to occur and the standard concatenation approach can return incorrect trees with high support in the presence of ILS, "coalescent-based" summary methods (which first estimate gene trees and then combine gene trees into a species tree) have been developed that have theoretical guarantees of robustness to arbitrarily high amounts of ILS. Some studies have suggested that summary methods should only be used on "c-genes" (i.e., recombination-free loci) that can be extremely short (sometimes fewer than 100 sites). However, gene trees estimated on short alignments can have high estimation error, and summary methods tend to have high error on short c-genes. To address this problem, Chifman and Kubatko introduced SVDquartets, a new coalescent-based method. SVDquartets takes multi-locus unlinked single-site data, infers the quartet trees for all subsets of four species, and then combines the set of quartet trees into a species tree using a quartet amalgamation heuristic. Yet, the relative accuracy of SVDquartets to leading coalescent-based methods has not been assessed. RESULTS: We compared SVDquartets to two leading coalescent-based methods (ASTRAL-2 and NJst), and to concatenation using maximum likelihood. We used a collection of simulated datasets, varying ILS levels, numbers of taxa, and number of sites per locus. Although SVDquartets was sometimes more accurate than ASTRAL-2 and NJst, most often the best results were obtained using ASTRAL-2, even on the shortest gene sequence alignments we explored (with only 10 sites per locus). Finally, concatenation was the most accurate of all methods under low ILS conditions. CONCLUSIONS: ASTRAL-2 generally had the best accuracy under higher ILS conditions, and concatenation had the best accuracy under the lowest ILS conditions. However, SVDquartets was competitive with the best methods under conditions with low ILS and small numbers of sites per locus. The good performance under many conditions of ASTRAL-2 in comparison to SVDquartets is surprising given the known vulnerability of ASTRAL-2 and similar methods to short gene sequences.
format Online
Article
Text
id pubmed-4602346
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46023462015-10-14 A comparative study of SVDquartets and other coalescent-based species tree estimation methods Chou, Jed Gupta, Ashu Yaduvanshi, Shashank Davidson, Ruth Nute, Mike Mirarab, Siavash Warnow, Tandy BMC Genomics Research BACKGROUND: Species tree estimation is challenging in the presence of incomplete lineage sorting (ILS), which can make gene trees different from the species tree. Because ILS is expected to occur and the standard concatenation approach can return incorrect trees with high support in the presence of ILS, "coalescent-based" summary methods (which first estimate gene trees and then combine gene trees into a species tree) have been developed that have theoretical guarantees of robustness to arbitrarily high amounts of ILS. Some studies have suggested that summary methods should only be used on "c-genes" (i.e., recombination-free loci) that can be extremely short (sometimes fewer than 100 sites). However, gene trees estimated on short alignments can have high estimation error, and summary methods tend to have high error on short c-genes. To address this problem, Chifman and Kubatko introduced SVDquartets, a new coalescent-based method. SVDquartets takes multi-locus unlinked single-site data, infers the quartet trees for all subsets of four species, and then combines the set of quartet trees into a species tree using a quartet amalgamation heuristic. Yet, the relative accuracy of SVDquartets to leading coalescent-based methods has not been assessed. RESULTS: We compared SVDquartets to two leading coalescent-based methods (ASTRAL-2 and NJst), and to concatenation using maximum likelihood. We used a collection of simulated datasets, varying ILS levels, numbers of taxa, and number of sites per locus. Although SVDquartets was sometimes more accurate than ASTRAL-2 and NJst, most often the best results were obtained using ASTRAL-2, even on the shortest gene sequence alignments we explored (with only 10 sites per locus). Finally, concatenation was the most accurate of all methods under low ILS conditions. CONCLUSIONS: ASTRAL-2 generally had the best accuracy under higher ILS conditions, and concatenation had the best accuracy under the lowest ILS conditions. However, SVDquartets was competitive with the best methods under conditions with low ILS and small numbers of sites per locus. The good performance under many conditions of ASTRAL-2 in comparison to SVDquartets is surprising given the known vulnerability of ASTRAL-2 and similar methods to short gene sequences. BioMed Central 2015-10-02 /pmc/articles/PMC4602346/ /pubmed/26449249 http://dx.doi.org/10.1186/1471-2164-16-S10-S2 Text en Copyright © 2015 Chou et al. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chou, Jed
Gupta, Ashu
Yaduvanshi, Shashank
Davidson, Ruth
Nute, Mike
Mirarab, Siavash
Warnow, Tandy
A comparative study of SVDquartets and other coalescent-based species tree estimation methods
title A comparative study of SVDquartets and other coalescent-based species tree estimation methods
title_full A comparative study of SVDquartets and other coalescent-based species tree estimation methods
title_fullStr A comparative study of SVDquartets and other coalescent-based species tree estimation methods
title_full_unstemmed A comparative study of SVDquartets and other coalescent-based species tree estimation methods
title_short A comparative study of SVDquartets and other coalescent-based species tree estimation methods
title_sort comparative study of svdquartets and other coalescent-based species tree estimation methods
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4602346/
https://www.ncbi.nlm.nih.gov/pubmed/26449249
http://dx.doi.org/10.1186/1471-2164-16-S10-S2
work_keys_str_mv AT choujed acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT guptaashu acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT yaduvanshishashank acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT davidsonruth acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT nutemike acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT mirarabsiavash acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT warnowtandy acomparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT choujed comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT guptaashu comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT yaduvanshishashank comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT davidsonruth comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT nutemike comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT mirarabsiavash comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods
AT warnowtandy comparativestudyofsvdquartetsandothercoalescentbasedspeciestreeestimationmethods