Cargando…

Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa

Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens t...

Descripción completa

Detalles Bibliográficos
Autores principales: Giliberti, Renato, Cavaliere, Sara, Mauriello, Italia Elisa, Ercolini, Danilo, Pasolli, Edoardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9064115/
https://www.ncbi.nlm.nih.gov/pubmed/35446845
http://dx.doi.org/10.1371/journal.pcbi.1010066
_version_ 1784699299341271040
author Giliberti, Renato
Cavaliere, Sara
Mauriello, Italia Elisa
Ercolini, Danilo
Pasolli, Edoardo
author_facet Giliberti, Renato
Cavaliere, Sara
Mauriello, Italia Elisa
Ercolini, Danilo
Pasolli, Edoardo
author_sort Giliberti, Renato
collection PubMed
description Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies.
format Online
Article
Text
id pubmed-9064115
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90641152022-05-04 Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa Giliberti, Renato Cavaliere, Sara Mauriello, Italia Elisa Ercolini, Danilo Pasolli, Edoardo PLoS Comput Biol Research Article Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies. Public Library of Science 2022-04-21 /pmc/articles/PMC9064115/ /pubmed/35446845 http://dx.doi.org/10.1371/journal.pcbi.1010066 Text en © 2022 Giliberti et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Giliberti, Renato
Cavaliere, Sara
Mauriello, Italia Elisa
Ercolini, Danilo
Pasolli, Edoardo
Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
title Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
title_full Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
title_fullStr Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
title_full_unstemmed Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
title_short Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
title_sort host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9064115/
https://www.ncbi.nlm.nih.gov/pubmed/35446845
http://dx.doi.org/10.1371/journal.pcbi.1010066
work_keys_str_mv AT gilibertirenato hostphenotypeclassificationfromhumanmicrobiomedataismainlydrivenbythepresenceofmicrobialtaxa
AT cavalieresara hostphenotypeclassificationfromhumanmicrobiomedataismainlydrivenbythepresenceofmicrobialtaxa
AT maurielloitaliaelisa hostphenotypeclassificationfromhumanmicrobiomedataismainlydrivenbythepresenceofmicrobialtaxa
AT ercolinidanilo hostphenotypeclassificationfromhumanmicrobiomedataismainlydrivenbythepresenceofmicrobialtaxa
AT pasolliedoardo hostphenotypeclassificationfromhumanmicrobiomedataismainlydrivenbythepresenceofmicrobialtaxa