Cargando…
Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype
Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome featu...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Mary Ann Liebert, Inc., publishers
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9464365/ https://www.ncbi.nlm.nih.gov/pubmed/35584271 http://dx.doi.org/10.1089/cmb.2021.0640 |
_version_ | 1784787566486093824 |
---|---|
author | Han, Wontack Tang, Haixu Ye, Yuzhen |
author_facet | Han, Wontack Tang, Haixu Ye, Yuzhen |
author_sort | Han, Wontack |
collection | PubMed |
description | Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs), which were subsequently used for the retrieval of differential kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes all genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples. |
format | Online Article Text |
id | pubmed-9464365 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Mary Ann Liebert, Inc., publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-94643652022-09-12 Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype Han, Wontack Tang, Haixu Ye, Yuzhen J Comput Biol Research Articles Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs), which were subsequently used for the retrieval of differential kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes all genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples. Mary Ann Liebert, Inc., publishers 2022-07-01 2022-07-06 /pmc/articles/PMC9464365/ /pubmed/35584271 http://dx.doi.org/10.1089/cmb.2021.0640 Text en © Wontack Han, et al., 2022. Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by-nc/4.0/This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. |
spellingShingle | Research Articles Han, Wontack Tang, Haixu Ye, Yuzhen Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype |
title | Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype |
title_full | Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype |
title_fullStr | Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype |
title_full_unstemmed | Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype |
title_short | Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype |
title_sort | locality-sensitive hashing-based k-mer clustering for identification of differential microbial markers related to host phenotype |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9464365/ https://www.ncbi.nlm.nih.gov/pubmed/35584271 http://dx.doi.org/10.1089/cmb.2021.0640 |
work_keys_str_mv | AT hanwontack localitysensitivehashingbasedkmerclusteringforidentificationofdifferentialmicrobialmarkersrelatedtohostphenotype AT tanghaixu localitysensitivehashingbasedkmerclusteringforidentificationofdifferentialmicrobialmarkersrelatedtohostphenotype AT yeyuzhen localitysensitivehashingbasedkmerclusteringforidentificationofdifferentialmicrobialmarkersrelatedtohostphenotype |