Cargando…

Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly

Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sameith, Katrin, Roscito, Juliana G, Hiller, Michael
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5221426/ https://www.ncbi.nlm.nih.gov/pubmed/26868358 http://dx.doi.org/10.1093/bib/bbw003

_version_	1782492805735645184
author	Sameith, Katrin Roscito, Juliana G Hiller, Michael
author_facet	Sameith, Katrin Roscito, Juliana G Hiller, Michael
author_sort	Sameith, Katrin
collection	PubMed
description	Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we show that reads with remaining errors after correction often overlap repeats, where short erroneous k-mers occur in other copies of the repeat. We developed an iterative error correction pipeline that runs the previously published String Graph Assembler (SGA) in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads. We show that higher read accuracy increases contig lengths two to three times. We provide SGA-Iteratively Correcting Errors (https://github.com/hillerlab/IterativeErrorCorrection/) that implements iterative error correction by using modules from SGA.
format	Online Article Text
id	pubmed-5221426
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-52214262017-01-12 Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly Sameith, Katrin Roscito, Juliana G Hiller, Michael Brief Bioinform Paper Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we show that reads with remaining errors after correction often overlap repeats, where short erroneous k-mers occur in other copies of the repeat. We developed an iterative error correction pipeline that runs the previously published String Graph Assembler (SGA) in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads. We show that higher read accuracy increases contig lengths two to three times. We provide SGA-Iteratively Correcting Errors (https://github.com/hillerlab/IterativeErrorCorrection/) that implements iterative error correction by using modules from SGA. Oxford University Press 2017-01 2016-02-10 /pmc/articles/PMC5221426/ /pubmed/26868358 http://dx.doi.org/10.1093/bib/bbw003 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Paper Sameith, Katrin Roscito, Juliana G Hiller, Michael Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
title	Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
title_full	Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
title_fullStr	Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
title_full_unstemmed	Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
title_short	Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
title_sort	iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly
topic	Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5221426/ https://www.ncbi.nlm.nih.gov/pubmed/26868358 http://dx.doi.org/10.1093/bib/bbw003
work_keys_str_mv	AT sameithkatrin iterativeerrorcorrectionoflongsequencingreadsmaximizesaccuracyandimprovescontigassembly AT roscitojulianag iterativeerrorcorrectionoflongsequencingreadsmaximizesaccuracyandimprovescontigassembly AT hillermichael iterativeerrorcorrectionoflongsequencingreadsmaximizesaccuracyandimprovescontigassembly

Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly

Ejemplares similares