Cargando…

Accuracy and efficiency of germline variant calling pipelines for human genome data

Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of diffe...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Sen, Agafonov, Oleg, Azab, Abdulrahman, Stokowy, Tomasz, Hovig, Eivind
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678823/
https://www.ncbi.nlm.nih.gov/pubmed/33214604
http://dx.doi.org/10.1038/s41598-020-77218-4
_version_ 1783612231039057920
author Zhao, Sen
Agafonov, Oleg
Azab, Abdulrahman
Stokowy, Tomasz
Hovig, Eivind
author_facet Zhao, Sen
Agafonov, Oleg
Azab, Abdulrahman
Stokowy, Tomasz
Hovig, Eivind
author_sort Zhao, Sen
collection PubMed
description Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.
format Online
Article
Text
id pubmed-7678823
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-76788232020-11-23 Accuracy and efficiency of germline variant calling pipelines for human genome data Zhao, Sen Agafonov, Oleg Azab, Abdulrahman Stokowy, Tomasz Hovig, Eivind Sci Rep Article Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications. Nature Publishing Group UK 2020-11-19 /pmc/articles/PMC7678823/ /pubmed/33214604 http://dx.doi.org/10.1038/s41598-020-77218-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Zhao, Sen
Agafonov, Oleg
Azab, Abdulrahman
Stokowy, Tomasz
Hovig, Eivind
Accuracy and efficiency of germline variant calling pipelines for human genome data
title Accuracy and efficiency of germline variant calling pipelines for human genome data
title_full Accuracy and efficiency of germline variant calling pipelines for human genome data
title_fullStr Accuracy and efficiency of germline variant calling pipelines for human genome data
title_full_unstemmed Accuracy and efficiency of germline variant calling pipelines for human genome data
title_short Accuracy and efficiency of germline variant calling pipelines for human genome data
title_sort accuracy and efficiency of germline variant calling pipelines for human genome data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678823/
https://www.ncbi.nlm.nih.gov/pubmed/33214604
http://dx.doi.org/10.1038/s41598-020-77218-4
work_keys_str_mv AT zhaosen accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata
AT agafonovoleg accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata
AT azababdulrahman accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata
AT stokowytomasz accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata
AT hovigeivind accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata