Cargando…
Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing
Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalli...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8954548/ https://www.ncbi.nlm.nih.gov/pubmed/35336445 http://dx.doi.org/10.3390/s22062275 |
_version_ | 1784676120341250048 |
---|---|
author | Napieralski, Adam Nowak, Robert |
author_facet | Napieralski, Adam Nowak, Robert |
author_sort | Napieralski, Adam |
collection | PubMed |
description | Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalling. Various solutions for basecalling have already been proposed. The earlier ones were based on Hidden Markov Models, but the best ones use neural networks or other machine learning models. Unfortunately, achieved accuracy scores are still lower than competitive sequencing techniques, like Illumina’s. Basecallers differ in the input data type—currently, most of them work on a raw data straight from the sequencer (time series of current). Still, the approach of using event data is also explored. Event data is obtained by preprocessing of raw data and dividing it into segments described by several features computed from raw data values within each segment. We propose a novel basecaller that uses joint processing of raw and event data. We define basecalling as a sequence-to-sequence translation, and we use a machine learning model based on an encoder–decoder architecture of recurrent neural networks. Our model incorporates twin encoders and an attention mechanism. We tested our solution on simulated and real datasets. We compare the full model accuracy results with its components: processing only raw or event data. We compare our solution with the existing ONT basecaller—Guppy. Results of numerical experiments show that joint raw and event data processing provides better basecalling accuracy than processing each data type separately. We implement an application called Ravvent, freely available under MIT licence. |
format | Online Article Text |
id | pubmed-8954548 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-89545482022-03-26 Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing Napieralski, Adam Nowak, Robert Sensors (Basel) Article Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalling. Various solutions for basecalling have already been proposed. The earlier ones were based on Hidden Markov Models, but the best ones use neural networks or other machine learning models. Unfortunately, achieved accuracy scores are still lower than competitive sequencing techniques, like Illumina’s. Basecallers differ in the input data type—currently, most of them work on a raw data straight from the sequencer (time series of current). Still, the approach of using event data is also explored. Event data is obtained by preprocessing of raw data and dividing it into segments described by several features computed from raw data values within each segment. We propose a novel basecaller that uses joint processing of raw and event data. We define basecalling as a sequence-to-sequence translation, and we use a machine learning model based on an encoder–decoder architecture of recurrent neural networks. Our model incorporates twin encoders and an attention mechanism. We tested our solution on simulated and real datasets. We compare the full model accuracy results with its components: processing only raw or event data. We compare our solution with the existing ONT basecaller—Guppy. Results of numerical experiments show that joint raw and event data processing provides better basecalling accuracy than processing each data type separately. We implement an application called Ravvent, freely available under MIT licence. MDPI 2022-03-15 /pmc/articles/PMC8954548/ /pubmed/35336445 http://dx.doi.org/10.3390/s22062275 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Napieralski, Adam Nowak, Robert Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing |
title | Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing |
title_full | Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing |
title_fullStr | Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing |
title_full_unstemmed | Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing |
title_short | Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing |
title_sort | basecalling using joint raw and event nanopore data sequence-to-sequence processing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8954548/ https://www.ncbi.nlm.nih.gov/pubmed/35336445 http://dx.doi.org/10.3390/s22062275 |
work_keys_str_mv | AT napieralskiadam basecallingusingjointrawandeventnanoporedatasequencetosequenceprocessing AT nowakrobert basecallingusingjointrawandeventnanoporedatasequencetosequenceprocessing |