Cargando…

clubber: removing the bioinformatics bottleneck in big data analyses

With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play...

Descripción completa

Detalles Bibliográficos
Autores principales:	Miller, Maximilian, Zhu, Chengsheng, Bromberg, Yana
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	De Gruyter 2017
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5929469/ https://www.ncbi.nlm.nih.gov/pubmed/28609295 http://dx.doi.org/10.1515/jib-2017-0020

_version_	1783319413153333248
author	Miller, Maximilian Zhu, Chengsheng Bromberg, Yana
author_facet	Miller, Maximilian Zhu, Chengsheng Bromberg, Yana
author_sort	Miller, Maximilian
collection	PubMed
description	With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.
format	Online Article Text
id	pubmed-5929469
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	De Gruyter
record_format	MEDLINE/PubMed
spelling	pubmed-59294692018-06-13 clubber: removing the bioinformatics bottleneck in big data analyses Miller, Maximilian Zhu, Chengsheng Bromberg, Yana J Integr Bioinform Research Articles With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment. De Gruyter 2017-06-13 /pmc/articles/PMC5929469/ /pubmed/28609295 http://dx.doi.org/10.1515/jib-2017-0020 Text en ©2017, M. Miller, published by De Gruyter, Berlin/Boston http://creativecommons.org/licenses/by-nc-nd/3.0 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
spellingShingle	Research Articles Miller, Maximilian Zhu, Chengsheng Bromberg, Yana clubber: removing the bioinformatics bottleneck in big data analyses
title	clubber: removing the bioinformatics bottleneck in big data analyses
title_full	clubber: removing the bioinformatics bottleneck in big data analyses
title_fullStr	clubber: removing the bioinformatics bottleneck in big data analyses
title_full_unstemmed	clubber: removing the bioinformatics bottleneck in big data analyses
title_short	clubber: removing the bioinformatics bottleneck in big data analyses
title_sort	clubber: removing the bioinformatics bottleneck in big data analyses
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5929469/ https://www.ncbi.nlm.nih.gov/pubmed/28609295 http://dx.doi.org/10.1515/jib-2017-0020
work_keys_str_mv	AT millermaximilian clubberremovingthebioinformaticsbottleneckinbigdataanalyses AT zhuchengsheng clubberremovingthebioinformaticsbottleneckinbigdataanalyses AT brombergyana clubberremovingthebioinformaticsbottleneckinbigdataanalyses

clubber: removing the bioinformatics bottleneck in big data analyses

Ejemplares similares