Cargando…

KmerKeys: a web resource for searching indexed genome assemblies and variants

K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavlichin, Dmitri S, Lee, HoJoon, Greer, Stephanie U, Grimes, Susan M, Weissman, Tsachy, Ji, Hanlee P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252721/
https://www.ncbi.nlm.nih.gov/pubmed/35474383
http://dx.doi.org/10.1093/nar/gkac266
_version_ 1784740331682529280
author Pavlichin, Dmitri S
Lee, HoJoon
Greer, Stephanie U
Grimes, Susan M
Weissman, Tsachy
Ji, Hanlee P
author_facet Pavlichin, Dmitri S
Lee, HoJoon
Greer, Stephanie U
Grimes, Susan M
Weissman, Tsachy
Ji, Hanlee P
author_sort Pavlichin, Dmitri S
collection PubMed
description K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.
format Online
Article
Text
id pubmed-9252721
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92527212022-07-05 KmerKeys: a web resource for searching indexed genome assemblies and variants Pavlichin, Dmitri S Lee, HoJoon Greer, Stephanie U Grimes, Susan M Weissman, Tsachy Ji, Hanlee P Nucleic Acids Res Web Server Issue K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org. Oxford University Press 2022-04-26 /pmc/articles/PMC9252721/ /pubmed/35474383 http://dx.doi.org/10.1093/nar/gkac266 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Web Server Issue
Pavlichin, Dmitri S
Lee, HoJoon
Greer, Stephanie U
Grimes, Susan M
Weissman, Tsachy
Ji, Hanlee P
KmerKeys: a web resource for searching indexed genome assemblies and variants
title KmerKeys: a web resource for searching indexed genome assemblies and variants
title_full KmerKeys: a web resource for searching indexed genome assemblies and variants
title_fullStr KmerKeys: a web resource for searching indexed genome assemblies and variants
title_full_unstemmed KmerKeys: a web resource for searching indexed genome assemblies and variants
title_short KmerKeys: a web resource for searching indexed genome assemblies and variants
title_sort kmerkeys: a web resource for searching indexed genome assemblies and variants
topic Web Server Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252721/
https://www.ncbi.nlm.nih.gov/pubmed/35474383
http://dx.doi.org/10.1093/nar/gkac266
work_keys_str_mv AT pavlichindmitris kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants
AT leehojoon kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants
AT greerstephanieu kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants
AT grimessusanm kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants
AT weissmantsachy kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants
AT jihanleep kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants