Cargando…

Sequencing error correction without a reference genome

BACKGROUND: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying...

Descripción completa

Detalles Bibliográficos
Autores principales: Sleep, Julie A, Schreiber, Andreas W, Baumann, Ute
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879328/
https://www.ncbi.nlm.nih.gov/pubmed/24350580
http://dx.doi.org/10.1186/1471-2105-14-367
_version_ 1782297963050041344
author Sleep, Julie A
Schreiber, Andreas W
Baumann, Ute
author_facet Sleep, Julie A
Schreiber, Andreas W
Baumann, Ute
author_sort Sleep, Julie A
collection PubMed
description BACKGROUND: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. RESULTS: We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. CONCLUSIONS: The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.
format Online
Article
Text
id pubmed-3879328
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38793282014-01-09 Sequencing error correction without a reference genome Sleep, Julie A Schreiber, Andreas W Baumann, Ute BMC Bioinformatics Methodology Article BACKGROUND: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. RESULTS: We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. CONCLUSIONS: The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms. BioMed Central 2013-12-18 /pmc/articles/PMC3879328/ /pubmed/24350580 http://dx.doi.org/10.1186/1471-2105-14-367 Text en Copyright © 2013 Sleep et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Sleep, Julie A
Schreiber, Andreas W
Baumann, Ute
Sequencing error correction without a reference genome
title Sequencing error correction without a reference genome
title_full Sequencing error correction without a reference genome
title_fullStr Sequencing error correction without a reference genome
title_full_unstemmed Sequencing error correction without a reference genome
title_short Sequencing error correction without a reference genome
title_sort sequencing error correction without a reference genome
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879328/
https://www.ncbi.nlm.nih.gov/pubmed/24350580
http://dx.doi.org/10.1186/1471-2105-14-367
work_keys_str_mv AT sleepjuliea sequencingerrorcorrectionwithoutareferencegenome
AT schreiberandreasw sequencingerrorcorrectionwithoutareferencegenome
AT baumannute sequencingerrorcorrectionwithoutareferencegenome