Cargando…

NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm

Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nan...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Luotong, Qu, Li, Yang, Longshu, Wang, Yiying, Zhu, Huaiqiu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7434944/
https://www.ncbi.nlm.nih.gov/pubmed/32903372
http://dx.doi.org/10.3389/fgene.2020.00900
_version_ 1783572244566376448
author Wang, Luotong
Qu, Li
Yang, Longshu
Wang, Yiying
Zhu, Huaiqiu
author_facet Wang, Luotong
Qu, Li
Yang, Longshu
Wang, Yiying
Zhu, Huaiqiu
author_sort Wang, Luotong
collection PubMed
description Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public E. coli and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the E. coli dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the E. coli dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the E. coli dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser.
format Online
Article
Text
id pubmed-7434944
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-74349442020-09-03 NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm Wang, Luotong Qu, Li Yang, Longshu Wang, Yiying Zhu, Huaiqiu Front Genet Genetics Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public E. coli and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the E. coli dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the E. coli dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the E. coli dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser. Frontiers Media S.A. 2020-08-12 /pmc/articles/PMC7434944/ /pubmed/32903372 http://dx.doi.org/10.3389/fgene.2020.00900 Text en Copyright © 2020 Wang, Qu, Yang, Wang and Zhu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wang, Luotong
Qu, Li
Yang, Longshu
Wang, Yiying
Zhu, Huaiqiu
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
title NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
title_full NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
title_fullStr NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
title_full_unstemmed NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
title_short NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
title_sort nanoreviser: an error-correction tool for nanopore sequencing based on a deep learning algorithm
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7434944/
https://www.ncbi.nlm.nih.gov/pubmed/32903372
http://dx.doi.org/10.3389/fgene.2020.00900
work_keys_str_mv AT wangluotong nanoreviseranerrorcorrectiontoolfornanoporesequencingbasedonadeeplearningalgorithm
AT quli nanoreviseranerrorcorrectiontoolfornanoporesequencingbasedonadeeplearningalgorithm
AT yanglongshu nanoreviseranerrorcorrectiontoolfornanoporesequencingbasedonadeeplearningalgorithm
AT wangyiying nanoreviseranerrorcorrectiontoolfornanoporesequencingbasedonadeeplearningalgorithm
AT zhuhuaiqiu nanoreviseranerrorcorrectiontoolfornanoporesequencingbasedonadeeplearningalgorithm