Cargando…

ntHash: recursive nucleotide hashing

Motivation: Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact...

Descripción completa

Detalles Bibliográficos
Autores principales: Mohamadi, Hamid, Chu, Justin, Vandervalk, Benjamin P., Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5181554/
https://www.ncbi.nlm.nih.gov/pubmed/27423894
http://dx.doi.org/10.1093/bioinformatics/btw397
_version_ 1782485730268807168
author Mohamadi, Hamid
Chu, Justin
Vandervalk, Benjamin P.
Birol, Inanc
author_facet Mohamadi, Hamid
Chu, Justin
Vandervalk, Benjamin P.
Birol, Inanc
author_sort Mohamadi, Hamid
collection PubMed
description Motivation: Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact in the field, making bioinformatics applications faster and more efficient. Results: We present ntHash, a hashing algorithm tuned for processing DNA/RNA sequences. It performs the best when calculating hash values for adjacent k-mers in an input sequence, operating an order of magnitude faster than the best performing alternatives in typical use cases. Availability and implementation: ntHash is available online at http://www.bcgsc.ca/platform/bioinfo/software/nthash and is free for academic use. Contacts: hmohamadi@bcgsc.ca or ibirol@bcgsc.ca Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5181554
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-51815542016-12-27 ntHash: recursive nucleotide hashing Mohamadi, Hamid Chu, Justin Vandervalk, Benjamin P. Birol, Inanc Bioinformatics Applications Notes Motivation: Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact in the field, making bioinformatics applications faster and more efficient. Results: We present ntHash, a hashing algorithm tuned for processing DNA/RNA sequences. It performs the best when calculating hash values for adjacent k-mers in an input sequence, operating an order of magnitude faster than the best performing alternatives in typical use cases. Availability and implementation: ntHash is available online at http://www.bcgsc.ca/platform/bioinfo/software/nthash and is free for academic use. Contacts: hmohamadi@bcgsc.ca or ibirol@bcgsc.ca Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-11-15 2016-07-16 /pmc/articles/PMC5181554/ /pubmed/27423894 http://dx.doi.org/10.1093/bioinformatics/btw397 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Mohamadi, Hamid
Chu, Justin
Vandervalk, Benjamin P.
Birol, Inanc
ntHash: recursive nucleotide hashing
title ntHash: recursive nucleotide hashing
title_full ntHash: recursive nucleotide hashing
title_fullStr ntHash: recursive nucleotide hashing
title_full_unstemmed ntHash: recursive nucleotide hashing
title_short ntHash: recursive nucleotide hashing
title_sort nthash: recursive nucleotide hashing
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5181554/
https://www.ncbi.nlm.nih.gov/pubmed/27423894
http://dx.doi.org/10.1093/bioinformatics/btw397
work_keys_str_mv AT mohamadihamid nthashrecursivenucleotidehashing
AT chujustin nthashrecursivenucleotidehashing
AT vandervalkbenjaminp nthashrecursivenucleotidehashing
AT birolinanc nthashrecursivenucleotidehashing