Cargando…

Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection

BACKGROUND: Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL)...

Descripción completa

Detalles Bibliográficos
Autores principales: Ohta, Tazro, Tanjo, Tomoya, Ogasawara, Osamu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6479428/
https://www.ncbi.nlm.nih.gov/pubmed/31222199
http://dx.doi.org/10.1093/gigascience/giz052
_version_ 1783413343428542464
author Ohta, Tazro
Tanjo, Tomoya
Ogasawara, Osamu
author_facet Ohta, Tazro
Tanjo, Tomoya
Ogasawara, Osamu
author_sort Ohta, Tazro
collection PubMed
description BACKGROUND: Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL) enable data to be easily analyzed on multiple computing environments. These technologies accelerate the use of on-demand cloud computing platforms, which can be scaled according to the quantity of data. However, to optimize the time and budgetary restraints of cloud usage, users must select a suitable instance type that corresponds to the resource requirements of their workflows. RESULTS: We developed CWL-metrics, a utility tool for cwltool (the reference implementation of CWL), to collect runtime metrics of Docker containers and workflow metadata to analyze workflow resource requirements. To demonstrate the use of this tool, we analyzed 7 transcriptome quantification workflows on 6 instance types. The results revealed that choice of instance type can deliver lower financial costs and faster execution times using the required amount of computational resources. CONCLUSIONS: CWL-metrics can generate a summary of resource requirements for workflow executions, which can help users to optimize their use of cloud computing by selecting appropriate instances. The runtime metrics data generated by CWL-metrics can also help users to share workflows between different workflow management frameworks.
format Online
Article
Text
id pubmed-6479428
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64794282019-05-01 Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection Ohta, Tazro Tanjo, Tomoya Ogasawara, Osamu Gigascience Research BACKGROUND: Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL) enable data to be easily analyzed on multiple computing environments. These technologies accelerate the use of on-demand cloud computing platforms, which can be scaled according to the quantity of data. However, to optimize the time and budgetary restraints of cloud usage, users must select a suitable instance type that corresponds to the resource requirements of their workflows. RESULTS: We developed CWL-metrics, a utility tool for cwltool (the reference implementation of CWL), to collect runtime metrics of Docker containers and workflow metadata to analyze workflow resource requirements. To demonstrate the use of this tool, we analyzed 7 transcriptome quantification workflows on 6 instance types. The results revealed that choice of instance type can deliver lower financial costs and faster execution times using the required amount of computational resources. CONCLUSIONS: CWL-metrics can generate a summary of resource requirements for workflow executions, which can help users to optimize their use of cloud computing by selecting appropriate instances. The runtime metrics data generated by CWL-metrics can also help users to share workflows between different workflow management frameworks. Oxford University Press 2019-04-24 /pmc/articles/PMC6479428/ /pubmed/31222199 http://dx.doi.org/10.1093/gigascience/giz052 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Ohta, Tazro
Tanjo, Tomoya
Ogasawara, Osamu
Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
title Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
title_full Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
title_fullStr Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
title_full_unstemmed Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
title_short Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
title_sort accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6479428/
https://www.ncbi.nlm.nih.gov/pubmed/31222199
http://dx.doi.org/10.1093/gigascience/giz052
work_keys_str_mv AT ohtatazro accumulatingcomputationalresourceusageofgenomicdataanalysisworkflowtooptimizecloudcomputinginstanceselection
AT tanjotomoya accumulatingcomputationalresourceusageofgenomicdataanalysisworkflowtooptimizecloudcomputinginstanceselection
AT ogasawaraosamu accumulatingcomputationalresourceusageofgenomicdataanalysisworkflowtooptimizecloudcomputinginstanceselection