Cargando…

clustermq enables efficient parallelization of genomic analyses

MOTIVATION: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to hi...

Descripción completa

Detalles Bibliográficos
Autor principal: Schubert, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821287/
https://www.ncbi.nlm.nih.gov/pubmed/31134271
http://dx.doi.org/10.1093/bioinformatics/btz284
_version_ 1783464116938080256
author Schubert, Michael
author_facet Schubert, Michael
author_sort Schubert, Michael
collection PubMed
description MOTIVATION: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to high numbers of tasks, and their processing overhead quickly becomes a prohibitive bottleneck. RESULTS: Here we present clustermq, an R package that can process analyses up to three orders of magnitude faster than previously published alternatives. We show this for investigating genomic associations of drug sensitivity in cancer cell lines, but it can be applied to any kind of parallelizable workflow. AVAILABILITY AND IMPLEMENTATION: The package is available on CRAN and https://github.com/mschubert/clustermq. Code for performance testing is available at https://github.com/mschubert/clustermq-performance. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6821287
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68212872019-11-04 clustermq enables efficient parallelization of genomic analyses Schubert, Michael Bioinformatics Applications Notes MOTIVATION: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to high numbers of tasks, and their processing overhead quickly becomes a prohibitive bottleneck. RESULTS: Here we present clustermq, an R package that can process analyses up to three orders of magnitude faster than previously published alternatives. We show this for investigating genomic associations of drug sensitivity in cancer cell lines, but it can be applied to any kind of parallelizable workflow. AVAILABILITY AND IMPLEMENTATION: The package is available on CRAN and https://github.com/mschubert/clustermq. Code for performance testing is available at https://github.com/mschubert/clustermq-performance. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-11-01 2019-05-27 /pmc/articles/PMC6821287/ /pubmed/31134271 http://dx.doi.org/10.1093/bioinformatics/btz284 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Schubert, Michael
clustermq enables efficient parallelization of genomic analyses
title clustermq enables efficient parallelization of genomic analyses
title_full clustermq enables efficient parallelization of genomic analyses
title_fullStr clustermq enables efficient parallelization of genomic analyses
title_full_unstemmed clustermq enables efficient parallelization of genomic analyses
title_short clustermq enables efficient parallelization of genomic analyses
title_sort clustermq enables efficient parallelization of genomic analyses
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821287/
https://www.ncbi.nlm.nih.gov/pubmed/31134271
http://dx.doi.org/10.1093/bioinformatics/btz284
work_keys_str_mv AT schubertmichael clustermqenablesefficientparallelizationofgenomicanalyses