Cargando…

Jabba: hybrid error correction for long sequencing reads

BACKGROUND: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction met...

Descripción completa

Detalles Bibliográficos
Autores principales:	Miclotte, Giles, Heydari, Mahdi, Demeester, Piet, Rombauts, Stephane, Van de Peer, Yves, Audenaert, Pieter, Fostier, Jan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855726/ https://www.ncbi.nlm.nih.gov/pubmed/27148393 http://dx.doi.org/10.1186/s13015-016-0075-7

_version_	1782430402300870656
author	Miclotte, Giles Heydari, Mahdi Demeester, Piet Rombauts, Stephane Van de Peer, Yves Audenaert, Pieter Fostier, Jan
author_facet	Miclotte, Giles Heydari, Mahdi Demeester, Piet Rombauts, Stephane Van de Peer, Yves Audenaert, Pieter Fostier, Jan
author_sort	Miclotte, Giles
collection	PubMed
description	BACKGROUND: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. RESULTS: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. CONCLUSION: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.
format	Online Article Text
id	pubmed-4855726
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-48557262016-05-05 Jabba: hybrid error correction for long sequencing reads Miclotte, Giles Heydari, Mahdi Demeester, Piet Rombauts, Stephane Van de Peer, Yves Audenaert, Pieter Fostier, Jan Algorithms Mol Biol Software Article BACKGROUND: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. RESULTS: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. CONCLUSION: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph. BioMed Central 2016-05-03 /pmc/articles/PMC4855726/ /pubmed/27148393 http://dx.doi.org/10.1186/s13015-016-0075-7 Text en © Miclotte et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Article Miclotte, Giles Heydari, Mahdi Demeester, Piet Rombauts, Stephane Van de Peer, Yves Audenaert, Pieter Fostier, Jan Jabba: hybrid error correction for long sequencing reads
title	Jabba: hybrid error correction for long sequencing reads
title_full	Jabba: hybrid error correction for long sequencing reads
title_fullStr	Jabba: hybrid error correction for long sequencing reads
title_full_unstemmed	Jabba: hybrid error correction for long sequencing reads
title_short	Jabba: hybrid error correction for long sequencing reads
title_sort	jabba: hybrid error correction for long sequencing reads
topic	Software Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855726/ https://www.ncbi.nlm.nih.gov/pubmed/27148393 http://dx.doi.org/10.1186/s13015-016-0075-7
work_keys_str_mv	AT miclottegiles jabbahybriderrorcorrectionforlongsequencingreads AT heydarimahdi jabbahybriderrorcorrectionforlongsequencingreads AT demeesterpiet jabbahybriderrorcorrectionforlongsequencingreads AT rombautsstephane jabbahybriderrorcorrectionforlongsequencingreads AT vandepeeryves jabbahybriderrorcorrectionforlongsequencingreads AT audenaertpieter jabbahybriderrorcorrectionforlongsequencingreads AT fostierjan jabbahybriderrorcorrectionforlongsequencingreads

Jabba: hybrid error correction for long sequencing reads

Ejemplares similares