Cargando…

Discovering motifs that induce sequencing errors

BACKGROUND: Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling error...

Descripción completa

Detalles Bibliográficos
Autores principales: Allhoff, Manuel, Schönhuth, Alexander, Martin, Marcel, Costa, Ivan G, Rahmann, Sven, Marschall, Tobias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622629/
https://www.ncbi.nlm.nih.gov/pubmed/23735080
http://dx.doi.org/10.1186/1471-2105-14-S5-S1
_version_ 1782265856888143872
author Allhoff, Manuel
Schönhuth, Alexander
Martin, Marcel
Costa, Ivan G
Rahmann, Sven
Marschall, Tobias
author_facet Allhoff, Manuel
Schönhuth, Alexander
Martin, Marcel
Costa, Ivan G
Rahmann, Sven
Marschall, Tobias
author_sort Allhoff, Manuel
collection PubMed
description BACKGROUND: Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decisive losses in power, due to relating errors with individual genomic positions rather than motifs, or do not properly distinguish between motif-induced and sequence-unspecific sources of errors. RESULTS: Here, for the first time, we describe a statistically rigorous framework for the discovery of motifs that induce sequencing errors. We apply our method to several datasets from Illumina GA IIx, HiSeq 2000, and MiSeq sequencers. We confirm previously known error-causing sequence contexts and report new more specific ones. CONCLUSIONS: Checking for error-inducing motifs should be included into SNP calling pipelines to avoid false positives. To facilitate filtering of sets of putative SNPs, we provide tracks of error-prone genomic positions (in BED format). AVAILABILITY: http://discovering-cse.googlecode.com
format Online
Article
Text
id pubmed-3622629
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36226292013-04-15 Discovering motifs that induce sequencing errors Allhoff, Manuel Schönhuth, Alexander Martin, Marcel Costa, Ivan G Rahmann, Sven Marschall, Tobias BMC Bioinformatics Proceedings BACKGROUND: Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decisive losses in power, due to relating errors with individual genomic positions rather than motifs, or do not properly distinguish between motif-induced and sequence-unspecific sources of errors. RESULTS: Here, for the first time, we describe a statistically rigorous framework for the discovery of motifs that induce sequencing errors. We apply our method to several datasets from Illumina GA IIx, HiSeq 2000, and MiSeq sequencers. We confirm previously known error-causing sequence contexts and report new more specific ones. CONCLUSIONS: Checking for error-inducing motifs should be included into SNP calling pipelines to avoid false positives. To facilitate filtering of sets of putative SNPs, we provide tracks of error-prone genomic positions (in BED format). AVAILABILITY: http://discovering-cse.googlecode.com BioMed Central 2013-04-10 /pmc/articles/PMC3622629/ /pubmed/23735080 http://dx.doi.org/10.1186/1471-2105-14-S5-S1 Text en Copyright © 2013 Allhoff et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Allhoff, Manuel
Schönhuth, Alexander
Martin, Marcel
Costa, Ivan G
Rahmann, Sven
Marschall, Tobias
Discovering motifs that induce sequencing errors
title Discovering motifs that induce sequencing errors
title_full Discovering motifs that induce sequencing errors
title_fullStr Discovering motifs that induce sequencing errors
title_full_unstemmed Discovering motifs that induce sequencing errors
title_short Discovering motifs that induce sequencing errors
title_sort discovering motifs that induce sequencing errors
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622629/
https://www.ncbi.nlm.nih.gov/pubmed/23735080
http://dx.doi.org/10.1186/1471-2105-14-S5-S1
work_keys_str_mv AT allhoffmanuel discoveringmotifsthatinducesequencingerrors
AT schonhuthalexander discoveringmotifsthatinducesequencingerrors
AT martinmarcel discoveringmotifsthatinducesequencingerrors
AT costaivang discoveringmotifsthatinducesequencingerrors
AT rahmannsven discoveringmotifsthatinducesequencingerrors
AT marschalltobias discoveringmotifsthatinducesequencingerrors