Cargando…

GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

BACKGROUND: There are various next-generation sequencing techniques, all of them striving to replace Sanger sequencing as the gold standard. However, false positive calls of single nucleotide variants and especially indels are a widely known problem of basically all sequencing platforms. METHODS: We...

Descripción completa

Detalles Bibliográficos
Autores principales: Sandmann, Sarah, de Graaf, Aniek O., van der Reijden, Bert A., Jansen, Joop H., Dugas, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5319672/
https://www.ncbi.nlm.nih.gov/pubmed/28222155
http://dx.doi.org/10.1371/journal.pone.0171983
_version_ 1782509411783147520
author Sandmann, Sarah
de Graaf, Aniek O.
van der Reijden, Bert A.
Jansen, Joop H.
Dugas, Martin
author_facet Sandmann, Sarah
de Graaf, Aniek O.
van der Reijden, Bert A.
Jansen, Joop H.
Dugas, Martin
author_sort Sandmann, Sarah
collection PubMed
description BACKGROUND: There are various next-generation sequencing techniques, all of them striving to replace Sanger sequencing as the gold standard. However, false positive calls of single nucleotide variants and especially indels are a widely known problem of basically all sequencing platforms. METHODS: We considered three common next-generation sequencers—Roche 454, Ion Torrent PGM and Illumina NextSeq—and applied standard as well as optimized variant calling pipelines. Optimization was achieved by combining information of 23 diverse parameters characterizing the reported variants and generating individually calibrated generalized linear models. Models were calibrated using amplicon-based targeted sequencing data (19 genes, 28,775 bp) from seven to 12 myelodysplastic syndrome patients. Evaluation of the optimized pipelines and platforms was performed using sequencing data from three additional myelodysplastic syndrome patients. RESULTS: Using standard analysis methods, true mutations were missed and the obtained results contained many artifacts—no matter which platform was considered. Analysis of the parameters characterizing the true and false positive calls revealed significant platform- and variant specific differences. Application of optimized variant calling pipelines considerably improved results. 76% of all false positive single nucleotide variants and 97% of all false positive indels could be filtered out. Positive predictive values could be increased by factors of 1.07 to 1.27 in case of single nucleotide variant calling and by factors of 3.33 to 53.87 in case of indel calling. Application of the optimized variant calling pipelines leads to comparable results for all next-generation sequencing platforms analyzed. However, regarding clinical diagnostics it needs to be considered that even the optimized results still contained false positive as well as false negative calls.
format Online
Article
Text
id pubmed-5319672
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-53196722017-03-03 GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data Sandmann, Sarah de Graaf, Aniek O. van der Reijden, Bert A. Jansen, Joop H. Dugas, Martin PLoS One Research Article BACKGROUND: There are various next-generation sequencing techniques, all of them striving to replace Sanger sequencing as the gold standard. However, false positive calls of single nucleotide variants and especially indels are a widely known problem of basically all sequencing platforms. METHODS: We considered three common next-generation sequencers—Roche 454, Ion Torrent PGM and Illumina NextSeq—and applied standard as well as optimized variant calling pipelines. Optimization was achieved by combining information of 23 diverse parameters characterizing the reported variants and generating individually calibrated generalized linear models. Models were calibrated using amplicon-based targeted sequencing data (19 genes, 28,775 bp) from seven to 12 myelodysplastic syndrome patients. Evaluation of the optimized pipelines and platforms was performed using sequencing data from three additional myelodysplastic syndrome patients. RESULTS: Using standard analysis methods, true mutations were missed and the obtained results contained many artifacts—no matter which platform was considered. Analysis of the parameters characterizing the true and false positive calls revealed significant platform- and variant specific differences. Application of optimized variant calling pipelines considerably improved results. 76% of all false positive single nucleotide variants and 97% of all false positive indels could be filtered out. Positive predictive values could be increased by factors of 1.07 to 1.27 in case of single nucleotide variant calling and by factors of 3.33 to 53.87 in case of indel calling. Application of the optimized variant calling pipelines leads to comparable results for all next-generation sequencing platforms analyzed. However, regarding clinical diagnostics it needs to be considered that even the optimized results still contained false positive as well as false negative calls. Public Library of Science 2017-02-21 /pmc/articles/PMC5319672/ /pubmed/28222155 http://dx.doi.org/10.1371/journal.pone.0171983 Text en © 2017 Sandmann et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sandmann, Sarah
de Graaf, Aniek O.
van der Reijden, Bert A.
Jansen, Joop H.
Dugas, Martin
GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
title GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
title_full GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
title_fullStr GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
title_full_unstemmed GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
title_short GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
title_sort glm-based optimization of ngs data analysis: a case study of roche 454, ion torrent pgm and illumina nextseq sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5319672/
https://www.ncbi.nlm.nih.gov/pubmed/28222155
http://dx.doi.org/10.1371/journal.pone.0171983
work_keys_str_mv AT sandmannsarah glmbasedoptimizationofngsdataanalysisacasestudyofroche454iontorrentpgmandilluminanextseqsequencingdata
AT degraafanieko glmbasedoptimizationofngsdataanalysisacasestudyofroche454iontorrentpgmandilluminanextseqsequencingdata
AT vanderreijdenberta glmbasedoptimizationofngsdataanalysisacasestudyofroche454iontorrentpgmandilluminanextseqsequencingdata
AT jansenjooph glmbasedoptimizationofngsdataanalysisacasestudyofroche454iontorrentpgmandilluminanextseqsequencingdata
AT dugasmartin glmbasedoptimizationofngsdataanalysisacasestudyofroche454iontorrentpgmandilluminanextseqsequencingdata