Cargando…

Gene-based microbiome representation enhances host phenotype classification

With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is compo...

Descripción completa

Detalles Bibliográficos
Autores principales: Deschênes, Thomas, Tohoundjona, Fred Wilfried Elom, Plante, Pier-Luc, Di Marzo, Vincenzo, Raymond, Frédéric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469787/
https://www.ncbi.nlm.nih.gov/pubmed/37404032
http://dx.doi.org/10.1128/msystems.00531-23
_version_ 1785099522641231872
author Deschênes, Thomas
Tohoundjona, Fred Wilfried Elom
Plante, Pier-Luc
Di Marzo, Vincenzo
Raymond, Frédéric
author_facet Deschênes, Thomas
Tohoundjona, Fred Wilfried Elom
Plante, Pier-Luc
Di Marzo, Vincenzo
Raymond, Frédéric
author_sort Deschênes, Thomas
collection PubMed
description With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCE: Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data.
format Online
Article
Text
id pubmed-10469787
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-104697872023-09-01 Gene-based microbiome representation enhances host phenotype classification Deschênes, Thomas Tohoundjona, Fred Wilfried Elom Plante, Pier-Luc Di Marzo, Vincenzo Raymond, Frédéric mSystems Research Article With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCE: Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data. American Society for Microbiology 2023-07-05 /pmc/articles/PMC10469787/ /pubmed/37404032 http://dx.doi.org/10.1128/msystems.00531-23 Text en Copyright © 2023 Deschênes et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Deschênes, Thomas
Tohoundjona, Fred Wilfried Elom
Plante, Pier-Luc
Di Marzo, Vincenzo
Raymond, Frédéric
Gene-based microbiome representation enhances host phenotype classification
title Gene-based microbiome representation enhances host phenotype classification
title_full Gene-based microbiome representation enhances host phenotype classification
title_fullStr Gene-based microbiome representation enhances host phenotype classification
title_full_unstemmed Gene-based microbiome representation enhances host phenotype classification
title_short Gene-based microbiome representation enhances host phenotype classification
title_sort gene-based microbiome representation enhances host phenotype classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469787/
https://www.ncbi.nlm.nih.gov/pubmed/37404032
http://dx.doi.org/10.1128/msystems.00531-23
work_keys_str_mv AT deschenesthomas genebasedmicrobiomerepresentationenhanceshostphenotypeclassification
AT tohoundjonafredwilfriedelom genebasedmicrobiomerepresentationenhanceshostphenotypeclassification
AT plantepierluc genebasedmicrobiomerepresentationenhanceshostphenotypeclassification
AT dimarzovincenzo genebasedmicrobiomerepresentationenhanceshostphenotypeclassification
AT raymondfrederic genebasedmicrobiomerepresentationenhanceshostphenotypeclassification