Cargando…
Gene-based microbiome representation enhances host phenotype classification
With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is compo...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469787/ https://www.ncbi.nlm.nih.gov/pubmed/37404032 http://dx.doi.org/10.1128/msystems.00531-23 |
_version_ | 1785099522641231872 |
---|---|
author | Deschênes, Thomas Tohoundjona, Fred Wilfried Elom Plante, Pier-Luc Di Marzo, Vincenzo Raymond, Frédéric |
author_facet | Deschênes, Thomas Tohoundjona, Fred Wilfried Elom Plante, Pier-Luc Di Marzo, Vincenzo Raymond, Frédéric |
author_sort | Deschênes, Thomas |
collection | PubMed |
description | With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCE: Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data. |
format | Online Article Text |
id | pubmed-10469787 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-104697872023-09-01 Gene-based microbiome representation enhances host phenotype classification Deschênes, Thomas Tohoundjona, Fred Wilfried Elom Plante, Pier-Luc Di Marzo, Vincenzo Raymond, Frédéric mSystems Research Article With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCE: Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data. American Society for Microbiology 2023-07-05 /pmc/articles/PMC10469787/ /pubmed/37404032 http://dx.doi.org/10.1128/msystems.00531-23 Text en Copyright © 2023 Deschênes et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Deschênes, Thomas Tohoundjona, Fred Wilfried Elom Plante, Pier-Luc Di Marzo, Vincenzo Raymond, Frédéric Gene-based microbiome representation enhances host phenotype classification |
title | Gene-based microbiome representation enhances host phenotype classification |
title_full | Gene-based microbiome representation enhances host phenotype classification |
title_fullStr | Gene-based microbiome representation enhances host phenotype classification |
title_full_unstemmed | Gene-based microbiome representation enhances host phenotype classification |
title_short | Gene-based microbiome representation enhances host phenotype classification |
title_sort | gene-based microbiome representation enhances host phenotype classification |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469787/ https://www.ncbi.nlm.nih.gov/pubmed/37404032 http://dx.doi.org/10.1128/msystems.00531-23 |
work_keys_str_mv | AT deschenesthomas genebasedmicrobiomerepresentationenhanceshostphenotypeclassification AT tohoundjonafredwilfriedelom genebasedmicrobiomerepresentationenhanceshostphenotypeclassification AT plantepierluc genebasedmicrobiomerepresentationenhanceshostphenotypeclassification AT dimarzovincenzo genebasedmicrobiomerepresentationenhanceshostphenotypeclassification AT raymondfrederic genebasedmicrobiomerepresentationenhanceshostphenotypeclassification |