Cargando…

Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype

Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome featu...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Wontack, Tang, Haixu, Ye, Yuzhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9464365/
https://www.ncbi.nlm.nih.gov/pubmed/35584271
http://dx.doi.org/10.1089/cmb.2021.0640
_version_ 1784787566486093824
author Han, Wontack
Tang, Haixu
Ye, Yuzhen
author_facet Han, Wontack
Tang, Haixu
Ye, Yuzhen
author_sort Han, Wontack
collection PubMed
description Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs), which were subsequently used for the retrieval of differential kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes all genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples.
format Online
Article
Text
id pubmed-9464365
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-94643652022-09-12 Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype Han, Wontack Tang, Haixu Ye, Yuzhen J Comput Biol Research Articles Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs), which were subsequently used for the retrieval of differential kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes all genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples. Mary Ann Liebert, Inc., publishers 2022-07-01 2022-07-06 /pmc/articles/PMC9464365/ /pubmed/35584271 http://dx.doi.org/10.1089/cmb.2021.0640 Text en © Wontack Han, et al., 2022. Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by-nc/4.0/This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Research Articles
Han, Wontack
Tang, Haixu
Ye, Yuzhen
Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
title Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
title_full Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
title_fullStr Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
title_full_unstemmed Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
title_short Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
title_sort locality-sensitive hashing-based k-mer clustering for identification of differential microbial markers related to host phenotype
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9464365/
https://www.ncbi.nlm.nih.gov/pubmed/35584271
http://dx.doi.org/10.1089/cmb.2021.0640
work_keys_str_mv AT hanwontack localitysensitivehashingbasedkmerclusteringforidentificationofdifferentialmicrobialmarkersrelatedtohostphenotype
AT tanghaixu localitysensitivehashingbasedkmerclusteringforidentificationofdifferentialmicrobialmarkersrelatedtohostphenotype
AT yeyuzhen localitysensitivehashingbasedkmerclusteringforidentificationofdifferentialmicrobialmarkersrelatedtohostphenotype