Cargando…
Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers
The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6597787/ https://www.ncbi.nlm.nih.gov/pubmed/31249349 http://dx.doi.org/10.1038/s41598-019-45835-3 |
_version_ | 1783430651002748928 |
---|---|
author | Chen, Jiayun Li, Xingsong Zhong, Hongbin Meng, Yuhuan Du, Hongli |
author_facet | Chen, Jiayun Li, Xingsong Zhong, Hongbin Meng, Yuhuan Du, Hongli |
author_sort | Chen, Jiayun |
collection | PubMed |
description | The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant guidance to NGS-based scientific and clinical genomics. In this study, we compared the performance, concordance and operating efficiency of 27 combinations of sequencing platforms and variant calling pipelines, testing three variant calling pipelines—Genome Analysis Tool Kit HaplotypeCaller, Strelka2 and Samtools-Varscan2 for nine data sets for the NA12878 genome sequenced by different platforms including BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants calling performance of 12 combinations in WES datasets, all combinations displayed good performance in calling SNPs, with their F-scores entirely higher than 0.96, and their performance in calling INDELs varies from 0.75 to 0.91. And all 15 combinations in WGS datasets also manifested good performance, with F-scores in calling SNPs were entirely higher than 0.975 and their performance in calling INDELs varies from 0.71 to 0.93. All of these combinations manifested high concordance in variant identification, while the divergence of variants identification in WGS datasets were larger than that in WES datasets. We also down-sampled the original WES and WGS datasets at a series of gradient coverage across multiple platforms, then the variants calling period consumed by the three pipelines at each coverage were counted, respectively. For the GIAB datasets on both BGI and Illumina platforms, Strelka2 manifested its ultra-performance in detecting accuracy and processing efficiency compared with other two pipelines on each sequencing platform, which was recommended in the further promotion and application of next generation sequencing technology. The results of our researches will provide useful and comprehensive guidelines for personal or organizational researchers in reliable and consistent variants identification. |
format | Online Article Text |
id | pubmed-6597787 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-65977872019-07-09 Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers Chen, Jiayun Li, Xingsong Zhong, Hongbin Meng, Yuhuan Du, Hongli Sci Rep Article The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant guidance to NGS-based scientific and clinical genomics. In this study, we compared the performance, concordance and operating efficiency of 27 combinations of sequencing platforms and variant calling pipelines, testing three variant calling pipelines—Genome Analysis Tool Kit HaplotypeCaller, Strelka2 and Samtools-Varscan2 for nine data sets for the NA12878 genome sequenced by different platforms including BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants calling performance of 12 combinations in WES datasets, all combinations displayed good performance in calling SNPs, with their F-scores entirely higher than 0.96, and their performance in calling INDELs varies from 0.75 to 0.91. And all 15 combinations in WGS datasets also manifested good performance, with F-scores in calling SNPs were entirely higher than 0.975 and their performance in calling INDELs varies from 0.71 to 0.93. All of these combinations manifested high concordance in variant identification, while the divergence of variants identification in WGS datasets were larger than that in WES datasets. We also down-sampled the original WES and WGS datasets at a series of gradient coverage across multiple platforms, then the variants calling period consumed by the three pipelines at each coverage were counted, respectively. For the GIAB datasets on both BGI and Illumina platforms, Strelka2 manifested its ultra-performance in detecting accuracy and processing efficiency compared with other two pipelines on each sequencing platform, which was recommended in the further promotion and application of next generation sequencing technology. The results of our researches will provide useful and comprehensive guidelines for personal or organizational researchers in reliable and consistent variants identification. Nature Publishing Group UK 2019-06-27 /pmc/articles/PMC6597787/ /pubmed/31249349 http://dx.doi.org/10.1038/s41598-019-45835-3 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Chen, Jiayun Li, Xingsong Zhong, Hongbin Meng, Yuhuan Du, Hongli Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
title | Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
title_full | Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
title_fullStr | Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
title_full_unstemmed | Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
title_short | Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
title_sort | systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6597787/ https://www.ncbi.nlm.nih.gov/pubmed/31249349 http://dx.doi.org/10.1038/s41598-019-45835-3 |
work_keys_str_mv | AT chenjiayun systematiccomparisonofgermlinevariantcallingpipelinescrossmultiplenextgenerationsequencers AT lixingsong systematiccomparisonofgermlinevariantcallingpipelinescrossmultiplenextgenerationsequencers AT zhonghongbin systematiccomparisonofgermlinevariantcallingpipelinescrossmultiplenextgenerationsequencers AT mengyuhuan systematiccomparisonofgermlinevariantcallingpipelinescrossmultiplenextgenerationsequencers AT duhongli systematiccomparisonofgermlinevariantcallingpipelinescrossmultiplenextgenerationsequencers |