Cargando…

FMLRC: Hybrid long read error correction using an FM-index

BACKGROUND: Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Jeremy R., Holt, James, McMillan, Leonard, Jones, Corbin D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5807796/ https://www.ncbi.nlm.nih.gov/pubmed/29426289 http://dx.doi.org/10.1186/s12859-018-2051-3

_version_	1783299346624675840
author	Wang, Jeremy R. Holt, James McMillan, Leonard Jones, Corbin D.
author_facet	Wang, Jeremy R. Holt, James McMillan, Leonard Jones, Corbin D.
author_sort	Wang, Jeremy R.
collection	PubMed
description	BACKGROUND: Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. RESULTS: We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. CONCLUSION: Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
format	Online Article Text
id	pubmed-5807796
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-58077962018-02-15 FMLRC: Hybrid long read error correction using an FM-index Wang, Jeremy R. Holt, James McMillan, Leonard Jones, Corbin D. BMC Bioinformatics Methodology Article BACKGROUND: Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. RESULTS: We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. CONCLUSION: Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies. BioMed Central 2018-02-09 /pmc/articles/PMC5807796/ /pubmed/29426289 http://dx.doi.org/10.1186/s12859-018-2051-3 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Wang, Jeremy R. Holt, James McMillan, Leonard Jones, Corbin D. FMLRC: Hybrid long read error correction using an FM-index
title	FMLRC: Hybrid long read error correction using an FM-index
title_full	FMLRC: Hybrid long read error correction using an FM-index
title_fullStr	FMLRC: Hybrid long read error correction using an FM-index
title_full_unstemmed	FMLRC: Hybrid long read error correction using an FM-index
title_short	FMLRC: Hybrid long read error correction using an FM-index
title_sort	fmlrc: hybrid long read error correction using an fm-index
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5807796/ https://www.ncbi.nlm.nih.gov/pubmed/29426289 http://dx.doi.org/10.1186/s12859-018-2051-3
work_keys_str_mv	AT wangjeremyr fmlrchybridlongreaderrorcorrectionusinganfmindex AT holtjames fmlrchybridlongreaderrorcorrectionusinganfmindex AT mcmillanleonard fmlrchybridlongreaderrorcorrectionusinganfmindex AT jonescorbind fmlrchybridlongreaderrorcorrectionusinganfmindex

FMLRC: Hybrid long read error correction using an FM-index

Ejemplares similares