Cargando…
MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
BACKGROUND: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating th...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009486/ https://www.ncbi.nlm.nih.gov/pubmed/20122186 http://dx.doi.org/10.1186/1471-2105-11-S1-S15 |
_version_ | 1782194689404829696 |
---|---|
author | Matthews, Suzanne J Williams, Tiffani L |
author_facet | Matthews, Suzanne J Williams, Tiffani L |
author_sort | Matthews, Suzanne J |
collection | PubMed |
description | BACKGROUND: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (MapReduce Speeds up RF), a multi-core algorithm to generate a t × t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm. RESULTS: We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. CONCLUSION: Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees. |
format | Text |
id | pubmed-3009486 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30094862010-12-23 MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees Matthews, Suzanne J Williams, Tiffani L BMC Bioinformatics Research BACKGROUND: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (MapReduce Speeds up RF), a multi-core algorithm to generate a t × t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm. RESULTS: We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. CONCLUSION: Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees. BioMed Central 2010-01-18 /pmc/articles/PMC3009486/ /pubmed/20122186 http://dx.doi.org/10.1186/1471-2105-11-S1-S15 Text en Copyright ©2010 Matthews and Williams; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Matthews, Suzanne J Williams, Tiffani L MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees |
title | MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees |
title_full | MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees |
title_fullStr | MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees |
title_full_unstemmed | MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees |
title_short | MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees |
title_sort | mrsrf: an efficient mapreduce algorithm for analyzing large collections of evolutionary trees |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009486/ https://www.ncbi.nlm.nih.gov/pubmed/20122186 http://dx.doi.org/10.1186/1471-2105-11-S1-S15 |
work_keys_str_mv | AT matthewssuzannej mrsrfanefficientmapreducealgorithmforanalyzinglargecollectionsofevolutionarytrees AT williamstiffanil mrsrfanefficientmapreducealgorithmforanalyzinglargecollectionsofevolutionarytrees |