Cargando…
Accuracy and efficiency of germline variant calling pipelines for human genome data
Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of diffe...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678823/ https://www.ncbi.nlm.nih.gov/pubmed/33214604 http://dx.doi.org/10.1038/s41598-020-77218-4 |
_version_ | 1783612231039057920 |
---|---|
author | Zhao, Sen Agafonov, Oleg Azab, Abdulrahman Stokowy, Tomasz Hovig, Eivind |
author_facet | Zhao, Sen Agafonov, Oleg Azab, Abdulrahman Stokowy, Tomasz Hovig, Eivind |
author_sort | Zhao, Sen |
collection | PubMed |
description | Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications. |
format | Online Article Text |
id | pubmed-7678823 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-76788232020-11-23 Accuracy and efficiency of germline variant calling pipelines for human genome data Zhao, Sen Agafonov, Oleg Azab, Abdulrahman Stokowy, Tomasz Hovig, Eivind Sci Rep Article Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications. Nature Publishing Group UK 2020-11-19 /pmc/articles/PMC7678823/ /pubmed/33214604 http://dx.doi.org/10.1038/s41598-020-77218-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Zhao, Sen Agafonov, Oleg Azab, Abdulrahman Stokowy, Tomasz Hovig, Eivind Accuracy and efficiency of germline variant calling pipelines for human genome data |
title | Accuracy and efficiency of germline variant calling pipelines for human genome data |
title_full | Accuracy and efficiency of germline variant calling pipelines for human genome data |
title_fullStr | Accuracy and efficiency of germline variant calling pipelines for human genome data |
title_full_unstemmed | Accuracy and efficiency of germline variant calling pipelines for human genome data |
title_short | Accuracy and efficiency of germline variant calling pipelines for human genome data |
title_sort | accuracy and efficiency of germline variant calling pipelines for human genome data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678823/ https://www.ncbi.nlm.nih.gov/pubmed/33214604 http://dx.doi.org/10.1038/s41598-020-77218-4 |
work_keys_str_mv | AT zhaosen accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata AT agafonovoleg accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata AT azababdulrahman accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata AT stokowytomasz accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata AT hovigeivind accuracyandefficiencyofgermlinevariantcallingpipelinesforhumangenomedata |