Cargando…
Mash: fast genome and metagenome distance estimation using MinHash
Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from whic...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915045/ https://www.ncbi.nlm.nih.gov/pubmed/27323842 http://dx.doi.org/10.1186/s13059-016-0997-x |
_version_ | 1782438635045388288 |
---|---|
author | Ondov, Brian D. Treangen, Todd J. Melsted, Páll Mallonee, Adam B. Bergman, Nicholas H. Koren, Sergey Phillippy, Adam M. |
author_facet | Ondov, Brian D. Treangen, Todd J. Melsted, Páll Mallonee, Adam B. Bergman, Nicholas H. Koren, Sergey Phillippy, Adam M. |
author_sort | Ondov, Brian D. |
collection | PubMed |
description | Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU h; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition. Mash is freely released under a BSD license (https://github.com/marbl/mash). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-0997-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4915045 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49150452016-06-22 Mash: fast genome and metagenome distance estimation using MinHash Ondov, Brian D. Treangen, Todd J. Melsted, Páll Mallonee, Adam B. Bergman, Nicholas H. Koren, Sergey Phillippy, Adam M. Genome Biol Software Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU h; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition. Mash is freely released under a BSD license (https://github.com/marbl/mash). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-0997-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-20 /pmc/articles/PMC4915045/ /pubmed/27323842 http://dx.doi.org/10.1186/s13059-016-0997-x Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Ondov, Brian D. Treangen, Todd J. Melsted, Páll Mallonee, Adam B. Bergman, Nicholas H. Koren, Sergey Phillippy, Adam M. Mash: fast genome and metagenome distance estimation using MinHash |
title | Mash: fast genome and metagenome distance estimation using MinHash |
title_full | Mash: fast genome and metagenome distance estimation using MinHash |
title_fullStr | Mash: fast genome and metagenome distance estimation using MinHash |
title_full_unstemmed | Mash: fast genome and metagenome distance estimation using MinHash |
title_short | Mash: fast genome and metagenome distance estimation using MinHash |
title_sort | mash: fast genome and metagenome distance estimation using minhash |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915045/ https://www.ncbi.nlm.nih.gov/pubmed/27323842 http://dx.doi.org/10.1186/s13059-016-0997-x |
work_keys_str_mv | AT ondovbriand mashfastgenomeandmetagenomedistanceestimationusingminhash AT treangentoddj mashfastgenomeandmetagenomedistanceestimationusingminhash AT melstedpall mashfastgenomeandmetagenomedistanceestimationusingminhash AT malloneeadamb mashfastgenomeandmetagenomedistanceestimationusingminhash AT bergmannicholash mashfastgenomeandmetagenomedistanceestimationusingminhash AT korensergey mashfastgenomeandmetagenomedistanceestimationusingminhash AT phillippyadamm mashfastgenomeandmetagenomedistanceestimationusingminhash |