Cargando…

EC: an efficient error correction algorithm for short reads

BACKGROUND: In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saha, Subrata, Rajasekaran, Sanguthevar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674864/ https://www.ncbi.nlm.nih.gov/pubmed/26678663 http://dx.doi.org/10.1186/1471-2105-16-S17-S2

_version_	1782404962703114240
author	Saha, Subrata Rajasekaran, Sanguthevar
author_facet	Saha, Subrata Rajasekaran, Sanguthevar
author_sort	Saha, Subrata
collection	PubMed
description	BACKGROUND: In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads. RESULTS: We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities. CONCLUSIONS: Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC), for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also. SOFTWARE AVAILABILITY: The implementation is freely available for non-commercial purposes. It can be downloaded from: http://engr.uconn.edu/~rajasek/EC.zip.
format	Online Article Text
id	pubmed-4674864
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46748642015-12-15 EC: an efficient error correction algorithm for short reads Saha, Subrata Rajasekaran, Sanguthevar BMC Bioinformatics Research BACKGROUND: In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads. RESULTS: We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities. CONCLUSIONS: Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC), for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also. SOFTWARE AVAILABILITY: The implementation is freely available for non-commercial purposes. It can be downloaded from: http://engr.uconn.edu/~rajasek/EC.zip. BioMed Central 2015-12-07 /pmc/articles/PMC4674864/ /pubmed/26678663 http://dx.doi.org/10.1186/1471-2105-16-S17-S2 Text en Copyright © 2015 Saha and Rajasekaran http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Saha, Subrata Rajasekaran, Sanguthevar EC: an efficient error correction algorithm for short reads
title	EC: an efficient error correction algorithm for short reads
title_full	EC: an efficient error correction algorithm for short reads
title_fullStr	EC: an efficient error correction algorithm for short reads
title_full_unstemmed	EC: an efficient error correction algorithm for short reads
title_short	EC: an efficient error correction algorithm for short reads
title_sort	ec: an efficient error correction algorithm for short reads
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674864/ https://www.ncbi.nlm.nih.gov/pubmed/26678663 http://dx.doi.org/10.1186/1471-2105-16-S17-S2
work_keys_str_mv	AT sahasubrata ecanefficienterrorcorrectionalgorithmforshortreads AT rajasekaransanguthevar ecanefficienterrorcorrectionalgorithmforshortreads

EC: an efficient error correction algorithm for short reads

Ejemplares similares