Cargando…

Systematic exploration of error sources in pyrosequencing flowgram data

Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, d...

Descripción completa

Detalles Bibliográficos
Autores principales: Balzer, Susanne, Malde, Ketil, Jonassen, Inge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117331/
https://www.ncbi.nlm.nih.gov/pubmed/21685085
http://dx.doi.org/10.1093/bioinformatics/btr251
_version_ 1782206316615303168
author Balzer, Susanne
Malde, Ketil
Jonassen, Inge
author_facet Balzer, Susanne
Malde, Ketil
Jonassen, Inge
author_sort Balzer, Susanne
collection PubMed
description Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. Results: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. Availability: The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. Contact: susanne.balzer@imr.no
format Online
Article
Text
id pubmed-3117331
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31173312011-06-17 Systematic exploration of error sources in pyrosequencing flowgram data Balzer, Susanne Malde, Ketil Jonassen, Inge Bioinformatics Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. Results: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. Availability: The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. Contact: susanne.balzer@imr.no Oxford University Press 2011-07-01 2011-06-14 /pmc/articles/PMC3117331/ /pubmed/21685085 http://dx.doi.org/10.1093/bioinformatics/btr251 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
Balzer, Susanne
Malde, Ketil
Jonassen, Inge
Systematic exploration of error sources in pyrosequencing flowgram data
title Systematic exploration of error sources in pyrosequencing flowgram data
title_full Systematic exploration of error sources in pyrosequencing flowgram data
title_fullStr Systematic exploration of error sources in pyrosequencing flowgram data
title_full_unstemmed Systematic exploration of error sources in pyrosequencing flowgram data
title_short Systematic exploration of error sources in pyrosequencing flowgram data
title_sort systematic exploration of error sources in pyrosequencing flowgram data
topic Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117331/
https://www.ncbi.nlm.nih.gov/pubmed/21685085
http://dx.doi.org/10.1093/bioinformatics/btr251
work_keys_str_mv AT balzersusanne systematicexplorationoferrorsourcesinpyrosequencingflowgramdata
AT maldeketil systematicexplorationoferrorsourcesinpyrosequencingflowgramdata
AT jonasseninge systematicexplorationoferrorsourcesinpyrosequencingflowgramdata