Cargando…

Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings

Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipeli...

Descripción completa

Detalles Bibliográficos
Autores principales: Hwang, Kyu-Baek, Lee, In-Hee, Li, Honglan, Won, Dhong-Geon, Hernandez-Ferrer, Carles, Negron, Jose Alberto, Kong, Sek Won
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397176/
https://www.ncbi.nlm.nih.gov/pubmed/30824715
http://dx.doi.org/10.1038/s41598-019-39108-2
_version_ 1783399374328430592
author Hwang, Kyu-Baek
Lee, In-Hee
Li, Honglan
Won, Dhong-Geon
Hernandez-Ferrer, Carles
Negron, Jose Alberto
Kong, Sek Won
author_facet Hwang, Kyu-Baek
Lee, In-Hee
Li, Honglan
Won, Dhong-Geon
Hernandez-Ferrer, Carles
Negron, Jose Alberto
Kong, Sek Won
author_sort Hwang, Kyu-Baek
collection PubMed
description Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests, P < 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes.
format Online
Article
Text
id pubmed-6397176
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-63971762019-03-05 Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings Hwang, Kyu-Baek Lee, In-Hee Li, Honglan Won, Dhong-Geon Hernandez-Ferrer, Carles Negron, Jose Alberto Kong, Sek Won Sci Rep Article Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests, P < 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes. Nature Publishing Group UK 2019-03-01 /pmc/articles/PMC6397176/ /pubmed/30824715 http://dx.doi.org/10.1038/s41598-019-39108-2 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Hwang, Kyu-Baek
Lee, In-Hee
Li, Honglan
Won, Dhong-Geon
Hernandez-Ferrer, Carles
Negron, Jose Alberto
Kong, Sek Won
Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
title Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
title_full Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
title_fullStr Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
title_full_unstemmed Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
title_short Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
title_sort comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397176/
https://www.ncbi.nlm.nih.gov/pubmed/30824715
http://dx.doi.org/10.1038/s41598-019-39108-2
work_keys_str_mv AT hwangkyubaek comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings
AT leeinhee comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings
AT lihonglan comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings
AT wondhonggeon comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings
AT hernandezferrercarles comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings
AT negronjosealberto comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings
AT kongsekwon comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings