Cargando…
Dashing: fast and accurate genomic distances with HyperLogLog
Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6892282/ https://www.ncbi.nlm.nih.gov/pubmed/31801633 http://dx.doi.org/10.1186/s13059-019-1875-0 |
_version_ | 1783476001340129280 |
---|---|
author | Baker, Daniel N. Langmead, Ben |
author_facet | Baker, Daniel N. Langmead, Ben |
author_sort | Baker, Daniel N. |
collection | PubMed |
description | Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing. |
format | Online Article Text |
id | pubmed-6892282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68922822019-12-11 Dashing: fast and accurate genomic distances with HyperLogLog Baker, Daniel N. Langmead, Ben Genome Biol Software Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing. BioMed Central 2019-12-04 /pmc/articles/PMC6892282/ /pubmed/31801633 http://dx.doi.org/10.1186/s13059-019-1875-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Baker, Daniel N. Langmead, Ben Dashing: fast and accurate genomic distances with HyperLogLog |
title | Dashing: fast and accurate genomic distances with HyperLogLog |
title_full | Dashing: fast and accurate genomic distances with HyperLogLog |
title_fullStr | Dashing: fast and accurate genomic distances with HyperLogLog |
title_full_unstemmed | Dashing: fast and accurate genomic distances with HyperLogLog |
title_short | Dashing: fast and accurate genomic distances with HyperLogLog |
title_sort | dashing: fast and accurate genomic distances with hyperloglog |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6892282/ https://www.ncbi.nlm.nih.gov/pubmed/31801633 http://dx.doi.org/10.1186/s13059-019-1875-0 |
work_keys_str_mv | AT bakerdanieln dashingfastandaccurategenomicdistanceswithhyperloglog AT langmeadben dashingfastandaccurategenomicdistanceswithhyperloglog |