Cargando…
A framework for space-efficient read clustering in metagenomic samples
BACKGROUND: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374685/ https://www.ncbi.nlm.nih.gov/pubmed/28361710 http://dx.doi.org/10.1186/s12859-017-1466-6 |
_version_ | 1782518942941577216 |
---|---|
author | Alanko, Jarno Cunial, Fabio Belazzougui, Djamal Mäkinen, Veli |
author_facet | Alanko, Jarno Cunial, Fabio Belazzougui, Djamal Mäkinen, Veli |
author_sort | Alanko, Jarno |
collection | PubMed |
description | BACKGROUND: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. RESULTS: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ℓ each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{ℓ σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. CONCLUSIONS: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art. |
format | Online Article Text |
id | pubmed-5374685 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53746852017-04-03 A framework for space-efficient read clustering in metagenomic samples Alanko, Jarno Cunial, Fabio Belazzougui, Djamal Mäkinen, Veli BMC Bioinformatics Research BACKGROUND: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. RESULTS: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ℓ each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{ℓ σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. CONCLUSIONS: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art. BioMed Central 2017-03-14 /pmc/articles/PMC5374685/ /pubmed/28361710 http://dx.doi.org/10.1186/s12859-017-1466-6 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Alanko, Jarno Cunial, Fabio Belazzougui, Djamal Mäkinen, Veli A framework for space-efficient read clustering in metagenomic samples |
title | A framework for space-efficient read clustering in metagenomic samples |
title_full | A framework for space-efficient read clustering in metagenomic samples |
title_fullStr | A framework for space-efficient read clustering in metagenomic samples |
title_full_unstemmed | A framework for space-efficient read clustering in metagenomic samples |
title_short | A framework for space-efficient read clustering in metagenomic samples |
title_sort | framework for space-efficient read clustering in metagenomic samples |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374685/ https://www.ncbi.nlm.nih.gov/pubmed/28361710 http://dx.doi.org/10.1186/s12859-017-1466-6 |
work_keys_str_mv | AT alankojarno aframeworkforspaceefficientreadclusteringinmetagenomicsamples AT cunialfabio aframeworkforspaceefficientreadclusteringinmetagenomicsamples AT belazzouguidjamal aframeworkforspaceefficientreadclusteringinmetagenomicsamples AT makinenveli aframeworkforspaceefficientreadclusteringinmetagenomicsamples AT alankojarno frameworkforspaceefficientreadclusteringinmetagenomicsamples AT cunialfabio frameworkforspaceefficientreadclusteringinmetagenomicsamples AT belazzouguidjamal frameworkforspaceefficientreadclusteringinmetagenomicsamples AT makinenveli frameworkforspaceefficientreadclusteringinmetagenomicsamples |