Cargando…

Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

BACKGROUND: Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of sys...

Descripción completa

Detalles Bibliográficos
Autores principales: Weißbach, Stephan, Sys, Stanislav, Hewel, Charlotte, Todorov, Hristo, Schweiger, Susann, Winter, Jennifer, Pfenninger, Markus, Torkamani, Ali, Evans, Doug, Burger, Joachim, Everschor-Sitte, Karin, May-Simera, Helen Louise, Gerber, Susanne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7814447/
https://www.ncbi.nlm.nih.gov/pubmed/33468057
http://dx.doi.org/10.1186/s12864-020-07362-8
_version_ 1783638057678798848
author Weißbach, Stephan
Sys, Stanislav
Hewel, Charlotte
Todorov, Hristo
Schweiger, Susann
Winter, Jennifer
Pfenninger, Markus
Torkamani, Ali
Evans, Doug
Burger, Joachim
Everschor-Sitte, Karin
May-Simera, Helen Louise
Gerber, Susanne
author_facet Weißbach, Stephan
Sys, Stanislav
Hewel, Charlotte
Todorov, Hristo
Schweiger, Susann
Winter, Jennifer
Pfenninger, Markus
Torkamani, Ali
Evans, Doug
Burger, Joachim
Everschor-Sitte, Karin
May-Simera, Helen Louise
Gerber, Susanne
author_sort Weißbach, Stephan
collection PubMed
description BACKGROUND: Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact. RESULTS: The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups. CONCLUSION: We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-020-07362-8.
format Online
Article
Text
id pubmed-7814447
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78144472021-01-19 Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines Weißbach, Stephan Sys, Stanislav Hewel, Charlotte Todorov, Hristo Schweiger, Susann Winter, Jennifer Pfenninger, Markus Torkamani, Ali Evans, Doug Burger, Joachim Everschor-Sitte, Karin May-Simera, Helen Louise Gerber, Susanne BMC Genomics Research Article BACKGROUND: Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact. RESULTS: The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups. CONCLUSION: We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-020-07362-8. BioMed Central 2021-01-19 /pmc/articles/PMC7814447/ /pubmed/33468057 http://dx.doi.org/10.1186/s12864-020-07362-8 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Weißbach, Stephan
Sys, Stanislav
Hewel, Charlotte
Todorov, Hristo
Schweiger, Susann
Winter, Jennifer
Pfenninger, Markus
Torkamani, Ali
Evans, Doug
Burger, Joachim
Everschor-Sitte, Karin
May-Simera, Helen Louise
Gerber, Susanne
Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
title Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
title_full Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
title_fullStr Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
title_full_unstemmed Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
title_short Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
title_sort reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7814447/
https://www.ncbi.nlm.nih.gov/pubmed/33468057
http://dx.doi.org/10.1186/s12864-020-07362-8
work_keys_str_mv AT weißbachstephan reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT sysstanislav reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT hewelcharlotte reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT todorovhristo reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT schweigersusann reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT winterjennifer reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT pfenningermarkus reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT torkamaniali reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT evansdoug reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT burgerjoachim reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT everschorsittekarin reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT maysimerahelenlouise reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines
AT gerbersusanne reliabilityofgenomicvariantsacrossdifferentnextgenerationsequencingplatformsandbioinformaticprocessingpipelines