Cargando…
Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings
Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipeli...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397176/ https://www.ncbi.nlm.nih.gov/pubmed/30824715 http://dx.doi.org/10.1038/s41598-019-39108-2 |
_version_ | 1783399374328430592 |
---|---|
author | Hwang, Kyu-Baek Lee, In-Hee Li, Honglan Won, Dhong-Geon Hernandez-Ferrer, Carles Negron, Jose Alberto Kong, Sek Won |
author_facet | Hwang, Kyu-Baek Lee, In-Hee Li, Honglan Won, Dhong-Geon Hernandez-Ferrer, Carles Negron, Jose Alberto Kong, Sek Won |
author_sort | Hwang, Kyu-Baek |
collection | PubMed |
description | Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests, P < 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes. |
format | Online Article Text |
id | pubmed-6397176 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-63971762019-03-05 Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings Hwang, Kyu-Baek Lee, In-Hee Li, Honglan Won, Dhong-Geon Hernandez-Ferrer, Carles Negron, Jose Alberto Kong, Sek Won Sci Rep Article Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests, P < 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes. Nature Publishing Group UK 2019-03-01 /pmc/articles/PMC6397176/ /pubmed/30824715 http://dx.doi.org/10.1038/s41598-019-39108-2 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Hwang, Kyu-Baek Lee, In-Hee Li, Honglan Won, Dhong-Geon Hernandez-Ferrer, Carles Negron, Jose Alberto Kong, Sek Won Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
title | Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
title_full | Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
title_fullStr | Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
title_full_unstemmed | Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
title_short | Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
title_sort | comparative analysis of whole-genome sequencing pipelines to minimize false negative findings |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397176/ https://www.ncbi.nlm.nih.gov/pubmed/30824715 http://dx.doi.org/10.1038/s41598-019-39108-2 |
work_keys_str_mv | AT hwangkyubaek comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings AT leeinhee comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings AT lihonglan comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings AT wondhonggeon comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings AT hernandezferrercarles comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings AT negronjosealberto comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings AT kongsekwon comparativeanalysisofwholegenomesequencingpipelinestominimizefalsenegativefindings |