Cargando…

Base calling for high-throughput short-read sequencing: dynamic programming solutions

BACKGROUND: Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in t...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Shreepriya, Vikalo, Haris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3776450/
https://www.ncbi.nlm.nih.gov/pubmed/23586484
http://dx.doi.org/10.1186/1471-2105-14-129
_version_ 1782477486437695488
author Das, Shreepriya
Vikalo, Haris
author_facet Das, Shreepriya
Vikalo, Haris
author_sort Das, Shreepriya
collection PubMed
description BACKGROUND: Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge. RESULTS: We develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog. CONCLUSION: We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications.
format Online
Article
Text
id pubmed-3776450
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37764502013-09-19 Base calling for high-throughput short-read sequencing: dynamic programming solutions Das, Shreepriya Vikalo, Haris BMC Bioinformatics Methodology Article BACKGROUND: Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge. RESULTS: We develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog. CONCLUSION: We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications. BioMed Central 2013-04-15 /pmc/articles/PMC3776450/ /pubmed/23586484 http://dx.doi.org/10.1186/1471-2105-14-129 Text en Copyright © 2013 Das and Vikalo; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Das, Shreepriya
Vikalo, Haris
Base calling for high-throughput short-read sequencing: dynamic programming solutions
title Base calling for high-throughput short-read sequencing: dynamic programming solutions
title_full Base calling for high-throughput short-read sequencing: dynamic programming solutions
title_fullStr Base calling for high-throughput short-read sequencing: dynamic programming solutions
title_full_unstemmed Base calling for high-throughput short-read sequencing: dynamic programming solutions
title_short Base calling for high-throughput short-read sequencing: dynamic programming solutions
title_sort base calling for high-throughput short-read sequencing: dynamic programming solutions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3776450/
https://www.ncbi.nlm.nih.gov/pubmed/23586484
http://dx.doi.org/10.1186/1471-2105-14-129
work_keys_str_mv AT dasshreepriya basecallingforhighthroughputshortreadsequencingdynamicprogrammingsolutions
AT vikaloharis basecallingforhighthroughputshortreadsequencingdynamicprogrammingsolutions