Cargando…

A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

BACKGROUND: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ramu, Avinash, Kahveci, Tamer, Burleigh, J Gordon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543182/ https://www.ncbi.nlm.nih.gov/pubmed/23033843 http://dx.doi.org/10.1186/1471-2105-13-256

_version_	1782255608732319744
author	Ramu, Avinash Kahveci, Tamer Burleigh, J Gordon
author_facet	Ramu, Avinash Kahveci, Tamer Burleigh, J Gordon
author_sort	Ramu, Avinash
collection	PubMed
description	BACKGROUND: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. RESULTS: We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. CONCLUSIONS: Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.
format	Online Article Text
id	pubmed-3543182
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35431822013-01-14 A scalable method for identifying frequent subtrees in sets of large phylogenetic trees Ramu, Avinash Kahveci, Tamer Burleigh, J Gordon BMC Bioinformatics Research Article BACKGROUND: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. RESULTS: We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. CONCLUSIONS: Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses. BioMed Central 2012-10-03 /pmc/articles/PMC3543182/ /pubmed/23033843 http://dx.doi.org/10.1186/1471-2105-13-256 Text en Copyright ©2012 Ramu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Ramu, Avinash Kahveci, Tamer Burleigh, J Gordon A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
title	A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
title_full	A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
title_fullStr	A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
title_full_unstemmed	A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
title_short	A scalable method for identifying frequent subtrees in sets of large phylogenetic trees
title_sort	scalable method for identifying frequent subtrees in sets of large phylogenetic trees
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543182/ https://www.ncbi.nlm.nih.gov/pubmed/23033843 http://dx.doi.org/10.1186/1471-2105-13-256
work_keys_str_mv	AT ramuavinash ascalablemethodforidentifyingfrequentsubtreesinsetsoflargephylogenetictrees AT kahvecitamer ascalablemethodforidentifyingfrequentsubtreesinsetsoflargephylogenetictrees AT burleighjgordon ascalablemethodforidentifyingfrequentsubtreesinsetsoflargephylogenetictrees AT ramuavinash scalablemethodforidentifyingfrequentsubtreesinsetsoflargephylogenetictrees AT kahvecitamer scalablemethodforidentifyingfrequentsubtreesinsetsoflargephylogenetictrees AT burleighjgordon scalablemethodforidentifyingfrequentsubtreesinsetsoflargephylogenetictrees

A scalable method for identifying frequent subtrees in sets of large phylogenetic trees

Ejemplares similares