Cargando…

Consensus properties for the deep coalescence problem and their application for scalable tree search

BACKGROUND: To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Harris T, Burleigh, J Gordon, Eulenstein, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382448/
https://www.ncbi.nlm.nih.gov/pubmed/22759417
http://dx.doi.org/10.1186/1471-2105-13-S10-S12
_version_ 1782236502319693824
author Lin, Harris T
Burleigh, J Gordon
Eulenstein, Oliver
author_facet Lin, Harris T
Burleigh, J Gordon
Eulenstein, Oliver
author_sort Lin, Harris T
collection PubMed
description BACKGROUND: To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as incomplete lineage sorting, which creates incongruence between gene trees and the species tree. One approach to account for deep coalescence in phylogenetic analyses is the deep coalescence problem, which takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events. Although this approach is promising for phylogenetics, the consensus properties of this problem are mostly unknown and analyses of large data sets may be computationally prohibitive. RESULTS: We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. Moreover, we introduce a new divide and conquer method for the deep coalescence problem based on the Pareto property. This method refines the strict consensus of the input gene trees, thereby, in practice, often greatly reducing the complexity of the tree search and guaranteeing that the estimated species tree will satisfy the Pareto property. CONCLUSIONS: Analyses of both simulated and empirical data sets demonstrate that the divide and conquer method can greatly improve upon the speed of heuristics that do not consider the Pareto consensus property, while also guaranteeing that the proposed solution fulfills the Pareto property. The divide and conquer method extends the utility of the deep coalescence problem to data sets with enormous numbers of taxa.
format Online
Article
Text
id pubmed-3382448
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33824482012-06-28 Consensus properties for the deep coalescence problem and their application for scalable tree search Lin, Harris T Burleigh, J Gordon Eulenstein, Oliver BMC Bioinformatics Proceedings BACKGROUND: To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as incomplete lineage sorting, which creates incongruence between gene trees and the species tree. One approach to account for deep coalescence in phylogenetic analyses is the deep coalescence problem, which takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events. Although this approach is promising for phylogenetics, the consensus properties of this problem are mostly unknown and analyses of large data sets may be computationally prohibitive. RESULTS: We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. Moreover, we introduce a new divide and conquer method for the deep coalescence problem based on the Pareto property. This method refines the strict consensus of the input gene trees, thereby, in practice, often greatly reducing the complexity of the tree search and guaranteeing that the estimated species tree will satisfy the Pareto property. CONCLUSIONS: Analyses of both simulated and empirical data sets demonstrate that the divide and conquer method can greatly improve upon the speed of heuristics that do not consider the Pareto consensus property, while also guaranteeing that the proposed solution fulfills the Pareto property. The divide and conquer method extends the utility of the deep coalescence problem to data sets with enormous numbers of taxa. BioMed Central 2012-06-25 /pmc/articles/PMC3382448/ /pubmed/22759417 http://dx.doi.org/10.1186/1471-2105-13-S10-S12 Text en Copyright ©2012 Lin et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lin, Harris T
Burleigh, J Gordon
Eulenstein, Oliver
Consensus properties for the deep coalescence problem and their application for scalable tree search
title Consensus properties for the deep coalescence problem and their application for scalable tree search
title_full Consensus properties for the deep coalescence problem and their application for scalable tree search
title_fullStr Consensus properties for the deep coalescence problem and their application for scalable tree search
title_full_unstemmed Consensus properties for the deep coalescence problem and their application for scalable tree search
title_short Consensus properties for the deep coalescence problem and their application for scalable tree search
title_sort consensus properties for the deep coalescence problem and their application for scalable tree search
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382448/
https://www.ncbi.nlm.nih.gov/pubmed/22759417
http://dx.doi.org/10.1186/1471-2105-13-S10-S12
work_keys_str_mv AT linharrist consensuspropertiesforthedeepcoalescenceproblemandtheirapplicationforscalabletreesearch
AT burleighjgordon consensuspropertiesforthedeepcoalescenceproblemandtheirapplicationforscalabletreesearch
AT eulensteinoliver consensuspropertiesforthedeepcoalescenceproblemandtheirapplicationforscalabletreesearch