Cargando…

16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing

BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil a...

Descripción completa

Detalles Bibliográficos
Autores principales: Rasheed, Zeehasham, Rangwala, Huzefa, Barbará, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3854655/
https://www.ncbi.nlm.nih.gov/pubmed/24565031
http://dx.doi.org/10.1186/1752-0509-7-S4-S11
_version_ 1782294841049219072
author Rasheed, Zeehasham
Rangwala, Huzefa
Barbará, Daniel
author_facet Rasheed, Zeehasham
Rangwala, Huzefa
Barbará, Daniel
author_sort Rasheed, Zeehasham
collection PubMed
description BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. RESULTS: We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis. CONCLUSION: The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations. WEBSITE: http://www.cs.gmu.edu/~mlbio/LSH-DIV
format Online
Article
Text
id pubmed-3854655
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38546552013-12-16 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing Rasheed, Zeehasham Rangwala, Huzefa Barbará, Daniel BMC Syst Biol Research BACKGROUND: Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. RESULTS: We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis. CONCLUSION: The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations. WEBSITE: http://www.cs.gmu.edu/~mlbio/LSH-DIV BioMed Central 2013-10-23 /pmc/articles/PMC3854655/ /pubmed/24565031 http://dx.doi.org/10.1186/1752-0509-7-S4-S11 Text en Copyright © 2013 Rasheed et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Rasheed, Zeehasham
Rangwala, Huzefa
Barbará, Daniel
16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
title 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
title_full 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
title_fullStr 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
title_full_unstemmed 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
title_short 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing
title_sort 16s rrna metagenome clustering and diversity estimation using locality sensitive hashing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3854655/
https://www.ncbi.nlm.nih.gov/pubmed/24565031
http://dx.doi.org/10.1186/1752-0509-7-S4-S11
work_keys_str_mv AT rasheedzeehasham 16srrnametagenomeclusteringanddiversityestimationusinglocalitysensitivehashing
AT rangwalahuzefa 16srrnametagenomeclusteringanddiversityestimationusinglocalitysensitivehashing
AT barbaradaniel 16srrnametagenomeclusteringanddiversityestimationusinglocalitysensitivehashing