Cargando…
RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes
Summary: Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311405/ https://www.ncbi.nlm.nih.gov/pubmed/37387139 http://dx.doi.org/10.1093/bioinformatics/btad272 |
_version_ | 1785066733586874368 |
---|---|
author | Firtina, Can Mansouri Ghiasi, Nika Lindegger, Joel Singh, Gagandeep Cavlak, Meryem Banu Mao, Haiyu Mutlu, Onur |
author_facet | Firtina, Can Mansouri Ghiasi, Nika Lindegger, Joel Singh, Gagandeep Cavlak, Meryem Banu Mao, Haiyu Mutlu, Onur |
author_sort | Firtina, Can |
collection | PubMed |
description | Summary: Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) [Formula: see text] and [Formula: see text] better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash. |
format | Online Article Text |
id | pubmed-10311405 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103114052023-07-01 RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes Firtina, Can Mansouri Ghiasi, Nika Lindegger, Joel Singh, Gagandeep Cavlak, Meryem Banu Mao, Haiyu Mutlu, Onur Bioinformatics Genome Sequence Analysis Summary: Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) [Formula: see text] and [Formula: see text] better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash. Oxford University Press 2023-06-30 /pmc/articles/PMC10311405/ /pubmed/37387139 http://dx.doi.org/10.1093/bioinformatics/btad272 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genome Sequence Analysis Firtina, Can Mansouri Ghiasi, Nika Lindegger, Joel Singh, Gagandeep Cavlak, Meryem Banu Mao, Haiyu Mutlu, Onur RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
title | RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
title_full | RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
title_fullStr | RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
title_full_unstemmed | RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
title_short | RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
title_sort | rawhash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes |
topic | Genome Sequence Analysis |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311405/ https://www.ncbi.nlm.nih.gov/pubmed/37387139 http://dx.doi.org/10.1093/bioinformatics/btad272 |
work_keys_str_mv | AT firtinacan rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes AT mansourighiasinika rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes AT lindeggerjoel rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes AT singhgagandeep rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes AT cavlakmeryembanu rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes AT maohaiyu rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes AT mutluonur rawhashenablingfastandaccuraterealtimeanalysisofrawnanoporesignalsforlargegenomes |