Cargando…
clustermq enables efficient parallelization of genomic analyses
MOTIVATION: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to hi...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821287/ https://www.ncbi.nlm.nih.gov/pubmed/31134271 http://dx.doi.org/10.1093/bioinformatics/btz284 |
_version_ | 1783464116938080256 |
---|---|
author | Schubert, Michael |
author_facet | Schubert, Michael |
author_sort | Schubert, Michael |
collection | PubMed |
description | MOTIVATION: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to high numbers of tasks, and their processing overhead quickly becomes a prohibitive bottleneck. RESULTS: Here we present clustermq, an R package that can process analyses up to three orders of magnitude faster than previously published alternatives. We show this for investigating genomic associations of drug sensitivity in cancer cell lines, but it can be applied to any kind of parallelizable workflow. AVAILABILITY AND IMPLEMENTATION: The package is available on CRAN and https://github.com/mschubert/clustermq. Code for performance testing is available at https://github.com/mschubert/clustermq-performance. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6821287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-68212872019-11-04 clustermq enables efficient parallelization of genomic analyses Schubert, Michael Bioinformatics Applications Notes MOTIVATION: High performance computing (HPC) clusters play a pivotal role in large-scale bioinformatics analysis and modeling. For the statistical computing language R, packages exist to enable a user to submit their analyses as jobs on HPC schedulers. However, these packages do not scale well to high numbers of tasks, and their processing overhead quickly becomes a prohibitive bottleneck. RESULTS: Here we present clustermq, an R package that can process analyses up to three orders of magnitude faster than previously published alternatives. We show this for investigating genomic associations of drug sensitivity in cancer cell lines, but it can be applied to any kind of parallelizable workflow. AVAILABILITY AND IMPLEMENTATION: The package is available on CRAN and https://github.com/mschubert/clustermq. Code for performance testing is available at https://github.com/mschubert/clustermq-performance. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-11-01 2019-05-27 /pmc/articles/PMC6821287/ /pubmed/31134271 http://dx.doi.org/10.1093/bioinformatics/btz284 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Schubert, Michael clustermq enables efficient parallelization of genomic analyses |
title | clustermq enables efficient parallelization of genomic analyses |
title_full | clustermq enables efficient parallelization of genomic analyses |
title_fullStr | clustermq enables efficient parallelization of genomic analyses |
title_full_unstemmed | clustermq enables efficient parallelization of genomic analyses |
title_short | clustermq enables efficient parallelization of genomic analyses |
title_sort | clustermq enables efficient parallelization of genomic analyses |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821287/ https://www.ncbi.nlm.nih.gov/pubmed/31134271 http://dx.doi.org/10.1093/bioinformatics/btz284 |
work_keys_str_mv | AT schubertmichael clustermqenablesefficientparallelizationofgenomicanalyses |