Cargando…
16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3854655/ https://www.ncbi.nlm.nih.gov/pubmed/24565031 http://dx.doi.org/10.1186/1752-0509-7-S4-S11 |
_version_ | 1782294841049219072 |
---|---|
author | Rasheed, Zeehasham Rangwala, Huzefa Barbará, Daniel |
author_facet | Rasheed, Zeehasham Rangwala, Huzefa Barbará, Daniel |
author_sort | Rasheed, Zeehasham |
collection | PubMed |
description | BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. RESULTS: We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis. CONCLUSION: The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations. WEBSITE: http://www.cs.gmu.edu/~mlbio/LSH-DIV |
format | Online Article Text |
id | pubmed-3854655 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38546552013-12-16 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing Rasheed, Zeehasham Rangwala, Huzefa Barbará, Daniel BMC Syst Biol Research BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. RESULTS: We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis. CONCLUSION: The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations. WEBSITE: http://www.cs.gmu.edu/~mlbio/LSH-DIV BioMed Central 2013-10-23 /pmc/articles/PMC3854655/ /pubmed/24565031 http://dx.doi.org/10.1186/1752-0509-7-S4-S11 Text en Copyright © 2013 Rasheed et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Rasheed, Zeehasham Rangwala, Huzefa Barbará, Daniel 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing |
title | 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing |
title_full | 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing |
title_fullStr | 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing |
title_full_unstemmed | 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing |
title_short | 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing |
title_sort | 16s rrna metagenome clustering and diversity estimation using locality sensitive hashing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3854655/ https://www.ncbi.nlm.nih.gov/pubmed/24565031 http://dx.doi.org/10.1186/1752-0509-7-S4-S11 |
work_keys_str_mv | AT rasheedzeehasham 16srrnametagenomeclusteringanddiversityestimationusinglocalitysensitivehashing AT rangwalahuzefa 16srrnametagenomeclusteringanddiversityestimationusinglocalitysensitivehashing AT barbaradaniel 16srrnametagenomeclusteringanddiversityestimationusinglocalitysensitivehashing |