Cargando…
KmerKeys: a web resource for searching indexed genome assemblies and variants
K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome ass...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252721/ https://www.ncbi.nlm.nih.gov/pubmed/35474383 http://dx.doi.org/10.1093/nar/gkac266 |
_version_ | 1784740331682529280 |
---|---|
author | Pavlichin, Dmitri S Lee, HoJoon Greer, Stephanie U Grimes, Susan M Weissman, Tsachy Ji, Hanlee P |
author_facet | Pavlichin, Dmitri S Lee, HoJoon Greer, Stephanie U Grimes, Susan M Weissman, Tsachy Ji, Hanlee P |
author_sort | Pavlichin, Dmitri S |
collection | PubMed |
description | K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org. |
format | Online Article Text |
id | pubmed-9252721 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92527212022-07-05 KmerKeys: a web resource for searching indexed genome assemblies and variants Pavlichin, Dmitri S Lee, HoJoon Greer, Stephanie U Grimes, Susan M Weissman, Tsachy Ji, Hanlee P Nucleic Acids Res Web Server Issue K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org. Oxford University Press 2022-04-26 /pmc/articles/PMC9252721/ /pubmed/35474383 http://dx.doi.org/10.1093/nar/gkac266 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Web Server Issue Pavlichin, Dmitri S Lee, HoJoon Greer, Stephanie U Grimes, Susan M Weissman, Tsachy Ji, Hanlee P KmerKeys: a web resource for searching indexed genome assemblies and variants |
title | KmerKeys: a web resource for searching indexed genome assemblies and variants |
title_full | KmerKeys: a web resource for searching indexed genome assemblies and variants |
title_fullStr | KmerKeys: a web resource for searching indexed genome assemblies and variants |
title_full_unstemmed | KmerKeys: a web resource for searching indexed genome assemblies and variants |
title_short | KmerKeys: a web resource for searching indexed genome assemblies and variants |
title_sort | kmerkeys: a web resource for searching indexed genome assemblies and variants |
topic | Web Server Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252721/ https://www.ncbi.nlm.nih.gov/pubmed/35474383 http://dx.doi.org/10.1093/nar/gkac266 |
work_keys_str_mv | AT pavlichindmitris kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants AT leehojoon kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants AT greerstephanieu kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants AT grimessusanm kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants AT weissmantsachy kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants AT jihanleep kmerkeysawebresourceforsearchingindexedgenomeassembliesandvariants |