Cargando…

Evaluation of serverless computing for scalable execution of a joint variant calling workflow

Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been wi...

Descripción completa

Detalles Bibliográficos
Autores principales: John, Aji, Muenzen, Kathleen, Ausmees, Kristiina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8270184/
https://www.ncbi.nlm.nih.gov/pubmed/34242357
http://dx.doi.org/10.1371/journal.pone.0254363
_version_ 1783720749475823616
author John, Aji
Muenzen, Kathleen
Ausmees, Kristiina
author_facet John, Aji
Muenzen, Kathleen
Ausmees, Kristiina
author_sort John, Aji
collection PubMed
description Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70.
format Online
Article
Text
id pubmed-8270184
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-82701842021-07-21 Evaluation of serverless computing for scalable execution of a joint variant calling workflow John, Aji Muenzen, Kathleen Ausmees, Kristiina PLoS One Research Article Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70. Public Library of Science 2021-07-09 /pmc/articles/PMC8270184/ /pubmed/34242357 http://dx.doi.org/10.1371/journal.pone.0254363 Text en © 2021 John et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
John, Aji
Muenzen, Kathleen
Ausmees, Kristiina
Evaluation of serverless computing for scalable execution of a joint variant calling workflow
title Evaluation of serverless computing for scalable execution of a joint variant calling workflow
title_full Evaluation of serverless computing for scalable execution of a joint variant calling workflow
title_fullStr Evaluation of serverless computing for scalable execution of a joint variant calling workflow
title_full_unstemmed Evaluation of serverless computing for scalable execution of a joint variant calling workflow
title_short Evaluation of serverless computing for scalable execution of a joint variant calling workflow
title_sort evaluation of serverless computing for scalable execution of a joint variant calling workflow
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8270184/
https://www.ncbi.nlm.nih.gov/pubmed/34242357
http://dx.doi.org/10.1371/journal.pone.0254363
work_keys_str_mv AT johnaji evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow
AT muenzenkathleen evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow
AT ausmeeskristiina evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow