Cargando…

Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms

Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware us...

Descripción completa

Detalles Bibliográficos
Autores principales: Franke, Karl R., Crowgey, Erin L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120354/
https://www.ncbi.nlm.nih.gov/pubmed/32224843
http://dx.doi.org/10.5808/GI.2020.18.1.e10
_version_ 1783514953985032192
author Franke, Karl R.
Crowgey, Erin L.
author_facet Franke, Karl R.
Crowgey, Erin L.
author_sort Franke, Karl R.
collection PubMed
description Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.
format Online
Article
Text
id pubmed-7120354
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-71203542020-04-09 Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms Franke, Karl R. Crowgey, Erin L. Genomics Inform Clinical Genomics Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping. Korea Genome Organization 2020-03-31 /pmc/articles/PMC7120354/ /pubmed/32224843 http://dx.doi.org/10.5808/GI.2020.18.1.e10 Text en (c) 2020, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Clinical Genomics
Franke, Karl R.
Crowgey, Erin L.
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_full Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_fullStr Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_full_unstemmed Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_short Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
title_sort accelerating next generation sequencing data analysis: an evaluation of optimized best practices for genome analysis toolkit algorithms
topic Clinical Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120354/
https://www.ncbi.nlm.nih.gov/pubmed/32224843
http://dx.doi.org/10.5808/GI.2020.18.1.e10
work_keys_str_mv AT frankekarlr acceleratingnextgenerationsequencingdataanalysisanevaluationofoptimizedbestpracticesforgenomeanalysistoolkitalgorithms
AT crowgeyerinl acceleratingnextgenerationsequencingdataanalysisanevaluationofoptimizedbestpracticesforgenomeanalysistoolkitalgorithms