Cargando…

HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data

BACKGROUND: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopol...

Descripción completa

Detalles Bibliográficos
Autores principales: Wirawan, Adrianto, Harris, Robert S, Liu, Yongchao, Schmidt, Bertil, Schröder, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4023493/
https://www.ncbi.nlm.nih.gov/pubmed/24885381
http://dx.doi.org/10.1186/1471-2105-15-131
_version_ 1782316557408403456
author Wirawan, Adrianto
Harris, Robert S
Liu, Yongchao
Schmidt, Bertil
Schröder, Jan
author_facet Wirawan, Adrianto
Harris, Robert S
Liu, Yongchao
Schmidt, Bertil
Schröder, Jan
author_sort Wirawan, Adrianto
collection PubMed
description BACKGROUND: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads. RESULTS: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7× faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low. CONCLUSION: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.
format Online
Article
Text
id pubmed-4023493
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40234932014-05-17 HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data Wirawan, Adrianto Harris, Robert S Liu, Yongchao Schmidt, Bertil Schröder, Jan BMC Bioinformatics Methodology Article BACKGROUND: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads. RESULTS: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7× faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low. CONCLUSION: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net. BioMed Central 2014-05-06 /pmc/articles/PMC4023493/ /pubmed/24885381 http://dx.doi.org/10.1186/1471-2105-15-131 Text en Copyright © 2014 Wirawan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Methodology Article
Wirawan, Adrianto
Harris, Robert S
Liu, Yongchao
Schmidt, Bertil
Schröder, Jan
HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
title HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
title_full HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
title_fullStr HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
title_full_unstemmed HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
title_short HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
title_sort hector: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4023493/
https://www.ncbi.nlm.nih.gov/pubmed/24885381
http://dx.doi.org/10.1186/1471-2105-15-131
work_keys_str_mv AT wirawanadrianto hectoraparallelmultistagehomopolymerspectrumbasederrorcorrectorfor454sequencingdata
AT harrisroberts hectoraparallelmultistagehomopolymerspectrumbasederrorcorrectorfor454sequencingdata
AT liuyongchao hectoraparallelmultistagehomopolymerspectrumbasederrorcorrectorfor454sequencingdata
AT schmidtbertil hectoraparallelmultistagehomopolymerspectrumbasederrorcorrectorfor454sequencingdata
AT schroderjan hectoraparallelmultistagehomopolymerspectrumbasederrorcorrectorfor454sequencingdata