Cargando…
Evaluation of serverless computing for scalable execution of a joint variant calling workflow
Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been wi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8270184/ https://www.ncbi.nlm.nih.gov/pubmed/34242357 http://dx.doi.org/10.1371/journal.pone.0254363 |
_version_ | 1783720749475823616 |
---|---|
author | John, Aji Muenzen, Kathleen Ausmees, Kristiina |
author_facet | John, Aji Muenzen, Kathleen Ausmees, Kristiina |
author_sort | John, Aji |
collection | PubMed |
description | Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70. |
format | Online Article Text |
id | pubmed-8270184 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-82701842021-07-21 Evaluation of serverless computing for scalable execution of a joint variant calling workflow John, Aji Muenzen, Kathleen Ausmees, Kristiina PLoS One Research Article Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70. Public Library of Science 2021-07-09 /pmc/articles/PMC8270184/ /pubmed/34242357 http://dx.doi.org/10.1371/journal.pone.0254363 Text en © 2021 John et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article John, Aji Muenzen, Kathleen Ausmees, Kristiina Evaluation of serverless computing for scalable execution of a joint variant calling workflow |
title | Evaluation of serverless computing for scalable execution of a joint variant calling workflow |
title_full | Evaluation of serverless computing for scalable execution of a joint variant calling workflow |
title_fullStr | Evaluation of serverless computing for scalable execution of a joint variant calling workflow |
title_full_unstemmed | Evaluation of serverless computing for scalable execution of a joint variant calling workflow |
title_short | Evaluation of serverless computing for scalable execution of a joint variant calling workflow |
title_sort | evaluation of serverless computing for scalable execution of a joint variant calling workflow |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8270184/ https://www.ncbi.nlm.nih.gov/pubmed/34242357 http://dx.doi.org/10.1371/journal.pone.0254363 |
work_keys_str_mv | AT johnaji evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow AT muenzenkathleen evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow AT ausmeeskristiina evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow |