Cargando…
Systematic exploration of error sources in pyrosequencing flowgram data
Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, d...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117331/ https://www.ncbi.nlm.nih.gov/pubmed/21685085 http://dx.doi.org/10.1093/bioinformatics/btr251 |
_version_ | 1782206316615303168 |
---|---|
author | Balzer, Susanne Malde, Ketil Jonassen, Inge |
author_facet | Balzer, Susanne Malde, Ketil Jonassen, Inge |
author_sort | Balzer, Susanne |
collection | PubMed |
description | Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. Results: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. Availability: The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. Contact: susanne.balzer@imr.no |
format | Online Article Text |
id | pubmed-3117331 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-31173312011-06-17 Systematic exploration of error sources in pyrosequencing flowgram data Balzer, Susanne Malde, Ketil Jonassen, Inge Bioinformatics Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. Results: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. Availability: The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. Contact: susanne.balzer@imr.no Oxford University Press 2011-07-01 2011-06-14 /pmc/articles/PMC3117331/ /pubmed/21685085 http://dx.doi.org/10.1093/bioinformatics/btr251 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Balzer, Susanne Malde, Ketil Jonassen, Inge Systematic exploration of error sources in pyrosequencing flowgram data |
title | Systematic exploration of error sources in pyrosequencing flowgram data |
title_full | Systematic exploration of error sources in pyrosequencing flowgram data |
title_fullStr | Systematic exploration of error sources in pyrosequencing flowgram data |
title_full_unstemmed | Systematic exploration of error sources in pyrosequencing flowgram data |
title_short | Systematic exploration of error sources in pyrosequencing flowgram data |
title_sort | systematic exploration of error sources in pyrosequencing flowgram data |
topic | Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117331/ https://www.ncbi.nlm.nih.gov/pubmed/21685085 http://dx.doi.org/10.1093/bioinformatics/btr251 |
work_keys_str_mv | AT balzersusanne systematicexplorationoferrorsourcesinpyrosequencingflowgramdata AT maldeketil systematicexplorationoferrorsourcesinpyrosequencingflowgramdata AT jonasseninge systematicexplorationoferrorsourcesinpyrosequencingflowgramdata |