Cargando…

De novo Nanopore read quality improvement using deep learning

BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads....

Descripción completa

Detalles Bibliográficos
Autores principales: LaPierre, Nathan, Egan, Rob, Wang, Wei, Wang, Zhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833143/
https://www.ncbi.nlm.nih.gov/pubmed/31694525
http://dx.doi.org/10.1186/s12859-019-3103-z
_version_ 1783466312272445440
author LaPierre, Nathan
Egan, Rob
Wang, Wei
Wang, Zhong
author_facet LaPierre, Nathan
Egan, Rob
Wang, Wei
Wang, Zhong
author_sort LaPierre, Nathan
collection PubMed
description BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub.
format Online
Article
Text
id pubmed-6833143
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68331432019-11-08 De novo Nanopore read quality improvement using deep learning LaPierre, Nathan Egan, Rob Wang, Wei Wang, Zhong BMC Bioinformatics Software BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub. BioMed Central 2019-11-06 /pmc/articles/PMC6833143/ /pubmed/31694525 http://dx.doi.org/10.1186/s12859-019-3103-z Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
LaPierre, Nathan
Egan, Rob
Wang, Wei
Wang, Zhong
De novo Nanopore read quality improvement using deep learning
title De novo Nanopore read quality improvement using deep learning
title_full De novo Nanopore read quality improvement using deep learning
title_fullStr De novo Nanopore read quality improvement using deep learning
title_full_unstemmed De novo Nanopore read quality improvement using deep learning
title_short De novo Nanopore read quality improvement using deep learning
title_sort de novo nanopore read quality improvement using deep learning
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6833143/
https://www.ncbi.nlm.nih.gov/pubmed/31694525
http://dx.doi.org/10.1186/s12859-019-3103-z
work_keys_str_mv AT lapierrenathan denovonanoporereadqualityimprovementusingdeeplearning
AT eganrob denovonanoporereadqualityimprovementusingdeeplearning
AT wangwei denovonanoporereadqualityimprovementusingdeeplearning
AT wangzhong denovonanoporereadqualityimprovementusingdeeplearning