Cargando…

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees

BACKGROUND: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating th...

Descripción completa

Detalles Bibliográficos
Autores principales: Matthews, Suzanne J, Williams, Tiffani L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009486/
https://www.ncbi.nlm.nih.gov/pubmed/20122186
http://dx.doi.org/10.1186/1471-2105-11-S1-S15
_version_ 1782194689404829696
author Matthews, Suzanne J
Williams, Tiffani L
author_facet Matthews, Suzanne J
Williams, Tiffani L
author_sort Matthews, Suzanne J
collection PubMed
description BACKGROUND: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (MapReduce Speeds up RF), a multi-core algorithm to generate a t × t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm. RESULTS: We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. CONCLUSION: Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees.
format Text
id pubmed-3009486
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30094862010-12-23 MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees Matthews, Suzanne J Williams, Tiffani L BMC Bioinformatics Research BACKGROUND: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (MapReduce Speeds up RF), a multi-core algorithm to generate a t × t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm. RESULTS: We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. CONCLUSION: Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees. BioMed Central 2010-01-18 /pmc/articles/PMC3009486/ /pubmed/20122186 http://dx.doi.org/10.1186/1471-2105-11-S1-S15 Text en Copyright ©2010 Matthews and Williams; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Matthews, Suzanne J
Williams, Tiffani L
MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
title MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
title_full MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
title_fullStr MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
title_full_unstemmed MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
title_short MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
title_sort mrsrf: an efficient mapreduce algorithm for analyzing large collections of evolutionary trees
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009486/
https://www.ncbi.nlm.nih.gov/pubmed/20122186
http://dx.doi.org/10.1186/1471-2105-11-S1-S15
work_keys_str_mv AT matthewssuzannej mrsrfanefficientmapreducealgorithmforanalyzinglargecollectionsofevolutionarytrees
AT williamstiffanil mrsrfanefficientmapreducealgorithmforanalyzinglargecollectionsofevolutionarytrees