Cargando…

Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model

BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, part...

Descripción completa

Detalles Bibliográficos
Autores principales: Beuf, Kristof De, Schrijver, Joachim De, Thas, Olivier, Criekinge, Wim Van, Irizarry, Rafael A, Clement, Lieven
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534400/
https://www.ncbi.nlm.nih.gov/pubmed/23151247
http://dx.doi.org/10.1186/1471-2105-13-303
_version_ 1782475331411640320
author Beuf, Kristof De
Schrijver, Joachim De
Thas, Olivier
Criekinge, Wim Van
Irizarry, Rafael A
Clement, Lieven
author_facet Beuf, Kristof De
Schrijver, Joachim De
Thas, Olivier
Criekinge, Wim Van
Irizarry, Rafael A
Clement, Lieven
author_sort Beuf, Kristof De
collection PubMed
description BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. RESULTS: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. CONCLUSIONS: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
format Online
Article
Text
id pubmed-3534400
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35344002013-01-03 Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model Beuf, Kristof De Schrijver, Joachim De Thas, Olivier Criekinge, Wim Van Irizarry, Rafael A Clement, Lieven BMC Bioinformatics Methodology Article BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. RESULTS: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. CONCLUSIONS: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies. BioMed Central 2012-11-15 /pmc/articles/PMC3534400/ /pubmed/23151247 http://dx.doi.org/10.1186/1471-2105-13-303 Text en Copyright ©2012 De Beuf et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Beuf, Kristof De
Schrijver, Joachim De
Thas, Olivier
Criekinge, Wim Van
Irizarry, Rafael A
Clement, Lieven
Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
title Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
title_full Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
title_fullStr Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
title_full_unstemmed Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
title_short Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
title_sort improved base-calling and quality scores for 454 sequencing based on a hurdle poisson model
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534400/
https://www.ncbi.nlm.nih.gov/pubmed/23151247
http://dx.doi.org/10.1186/1471-2105-13-303
work_keys_str_mv AT beufkristofde improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel
AT schrijverjoachimde improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel
AT thasolivier improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel
AT criekingewimvan improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel
AT irizarryrafaela improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel
AT clementlieven improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel