Cargando…
Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, part...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534400/ https://www.ncbi.nlm.nih.gov/pubmed/23151247 http://dx.doi.org/10.1186/1471-2105-13-303 |
_version_ | 1782475331411640320 |
---|---|
author | Beuf, Kristof De Schrijver, Joachim De Thas, Olivier Criekinge, Wim Van Irizarry, Rafael A Clement, Lieven |
author_facet | Beuf, Kristof De Schrijver, Joachim De Thas, Olivier Criekinge, Wim Van Irizarry, Rafael A Clement, Lieven |
author_sort | Beuf, Kristof De |
collection | PubMed |
description | BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. RESULTS: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. CONCLUSIONS: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies. |
format | Online Article Text |
id | pubmed-3534400 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35344002013-01-03 Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model Beuf, Kristof De Schrijver, Joachim De Thas, Olivier Criekinge, Wim Van Irizarry, Rafael A Clement, Lieven BMC Bioinformatics Methodology Article BACKGROUND: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. RESULTS: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. CONCLUSIONS: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies. BioMed Central 2012-11-15 /pmc/articles/PMC3534400/ /pubmed/23151247 http://dx.doi.org/10.1186/1471-2105-13-303 Text en Copyright ©2012 De Beuf et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Beuf, Kristof De Schrijver, Joachim De Thas, Olivier Criekinge, Wim Van Irizarry, Rafael A Clement, Lieven Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model |
title | Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model |
title_full | Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model |
title_fullStr | Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model |
title_full_unstemmed | Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model |
title_short | Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model |
title_sort | improved base-calling and quality scores for 454 sequencing based on a hurdle poisson model |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534400/ https://www.ncbi.nlm.nih.gov/pubmed/23151247 http://dx.doi.org/10.1186/1471-2105-13-303 |
work_keys_str_mv | AT beufkristofde improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel AT schrijverjoachimde improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel AT thasolivier improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel AT criekingewimvan improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel AT irizarryrafaela improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel AT clementlieven improvedbasecallingandqualityscoresfor454sequencingbasedonahurdlepoissonmodel |