Cargando…
OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
BACKGROUND: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a targe...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8361789/ https://www.ncbi.nlm.nih.gov/pubmed/34388963 http://dx.doi.org/10.1186/s12859-021-04317-y |
_version_ | 1783738019520446464 |
---|---|
author | Bathke, Jochen Lühken, Gesine |
author_facet | Bathke, Jochen Lühken, Gesine |
author_sort | Bathke, Jochen |
collection | PubMed |
description | BACKGROUND: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time. RESULTS: A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half. CONCLUSIONS: The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling. |
format | Online Article Text |
id | pubmed-8361789 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-83617892021-08-17 OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow Bathke, Jochen Lühken, Gesine BMC Bioinformatics Software BACKGROUND: The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time. RESULTS: A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half. CONCLUSIONS: The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling. BioMed Central 2021-08-13 /pmc/articles/PMC8361789/ /pubmed/34388963 http://dx.doi.org/10.1186/s12859-021-04317-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Bathke, Jochen Lühken, Gesine OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow |
title | OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow |
title_full | OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow |
title_fullStr | OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow |
title_full_unstemmed | OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow |
title_short | OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow |
title_sort | ovarflow: a resource optimized gatk 4 based open source variant calling workflow |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8361789/ https://www.ncbi.nlm.nih.gov/pubmed/34388963 http://dx.doi.org/10.1186/s12859-021-04317-y |
work_keys_str_mv | AT bathkejochen ovarflowaresourceoptimizedgatk4basedopensourcevariantcallingworkflow AT luhkengesine ovarflowaresourceoptimizedgatk4basedopensourcevariantcallingworkflow |