Cargando…

Using state machines to model the Ion Torrent sequencing process and to improve read error rates

Motivation: The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is Ion Torrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted...

Descripción completa

Detalles Bibliográficos
Autores principales: Golan, David, Medvedev, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694666/
https://www.ncbi.nlm.nih.gov/pubmed/23813003
http://dx.doi.org/10.1093/bioinformatics/btt212
_version_ 1782274884760502272
author Golan, David
Medvedev, Paul
author_facet Golan, David
Medvedev, Paul
author_sort Golan, David
collection PubMed
description Motivation: The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is Ion Torrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, Ion Torrent has been gaining popularity since its debut in 2011. Despite the advantages, however, Ion Torrent read accuracy remains a significant concern. Results: We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear-time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram-based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent Escherichia coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads. Availability: Binaries and source code of FlowgramFixer are freely available at: http://www.cs.tau.ac.il/~davidgo5/flowgramfixer.html. Contact: davidgo5@post.tau.ac.il
format Online
Article
Text
id pubmed-3694666
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36946662013-06-27 Using state machines to model the Ion Torrent sequencing process and to improve read error rates Golan, David Medvedev, Paul Bioinformatics Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany Motivation: The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is Ion Torrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, Ion Torrent has been gaining popularity since its debut in 2011. Despite the advantages, however, Ion Torrent read accuracy remains a significant concern. Results: We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear-time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram-based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent Escherichia coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads. Availability: Binaries and source code of FlowgramFixer are freely available at: http://www.cs.tau.ac.il/~davidgo5/flowgramfixer.html. Contact: davidgo5@post.tau.ac.il Oxford University Press 2013-07-01 2013-06-19 /pmc/articles/PMC3694666/ /pubmed/23813003 http://dx.doi.org/10.1093/bioinformatics/btt212 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
Golan, David
Medvedev, Paul
Using state machines to model the Ion Torrent sequencing process and to improve read error rates
title Using state machines to model the Ion Torrent sequencing process and to improve read error rates
title_full Using state machines to model the Ion Torrent sequencing process and to improve read error rates
title_fullStr Using state machines to model the Ion Torrent sequencing process and to improve read error rates
title_full_unstemmed Using state machines to model the Ion Torrent sequencing process and to improve read error rates
title_short Using state machines to model the Ion Torrent sequencing process and to improve read error rates
title_sort using state machines to model the ion torrent sequencing process and to improve read error rates
topic Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694666/
https://www.ncbi.nlm.nih.gov/pubmed/23813003
http://dx.doi.org/10.1093/bioinformatics/btt212
work_keys_str_mv AT golandavid usingstatemachinestomodeltheiontorrentsequencingprocessandtoimprovereaderrorrates
AT medvedevpaul usingstatemachinestomodeltheiontorrentsequencingprocessandtoimprovereaderrorrates