Cargando…

OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow

BACKGROUND: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a targe...

Descripción completa

Detalles Bibliográficos
Autores principales: Bathke, Jochen, Lühken, Gesine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8361789/
https://www.ncbi.nlm.nih.gov/pubmed/34388963
http://dx.doi.org/10.1186/s12859-021-04317-y
_version_ 1783738019520446464
author Bathke, Jochen
Lühken, Gesine
author_facet Bathke, Jochen
Lühken, Gesine
author_sort Bathke, Jochen
collection PubMed
description BACKGROUND: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time. RESULTS: A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half. CONCLUSIONS: The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling.
format Online
Article
Text
id pubmed-8361789
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83617892021-08-17 OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow Bathke, Jochen Lühken, Gesine BMC Bioinformatics Software BACKGROUND: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time. RESULTS: A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half. CONCLUSIONS: The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling. BioMed Central 2021-08-13 /pmc/articles/PMC8361789/ /pubmed/34388963 http://dx.doi.org/10.1186/s12859-021-04317-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Bathke, Jochen
Lühken, Gesine
OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
title OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
title_full OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
title_fullStr OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
title_full_unstemmed OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
title_short OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
title_sort ovarflow: a resource optimized gatk 4 based open source variant calling workflow
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8361789/
https://www.ncbi.nlm.nih.gov/pubmed/34388963
http://dx.doi.org/10.1186/s12859-021-04317-y
work_keys_str_mv AT bathkejochen ovarflowaresourceoptimizedgatk4basedopensourcevariantcallingworkflow
AT luhkengesine ovarflowaresourceoptimizedgatk4basedopensourcevariantcallingworkflow