Cargando…

Machine learning approaches in microbiome research: challenges and best practices

Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To as...

Descripción completa

Detalles Bibliográficos
Autores principales: Papoutsoglou, Georgios, Tarazona, Sonia, Lopes, Marta B., Klammsteiner, Thomas, Ibrahimi, Eliana, Eckenberger, Julia, Novielli, Pierfrancesco, Tonda, Alberto, Simeon, Andrea, Shigdel, Rajesh, Béreux, Stéphane, Vitali, Giacomo, Tangaro, Sabina, Lahti, Leo, Temko, Andriy, Claesson, Marcus J., Berland, Magali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556866/
https://www.ncbi.nlm.nih.gov/pubmed/37808286
http://dx.doi.org/10.3389/fmicb.2023.1261889
_version_ 1785116961538048000
author Papoutsoglou, Georgios
Tarazona, Sonia
Lopes, Marta B.
Klammsteiner, Thomas
Ibrahimi, Eliana
Eckenberger, Julia
Novielli, Pierfrancesco
Tonda, Alberto
Simeon, Andrea
Shigdel, Rajesh
Béreux, Stéphane
Vitali, Giacomo
Tangaro, Sabina
Lahti, Leo
Temko, Andriy
Claesson, Marcus J.
Berland, Magali
author_facet Papoutsoglou, Georgios
Tarazona, Sonia
Lopes, Marta B.
Klammsteiner, Thomas
Ibrahimi, Eliana
Eckenberger, Julia
Novielli, Pierfrancesco
Tonda, Alberto
Simeon, Andrea
Shigdel, Rajesh
Béreux, Stéphane
Vitali, Giacomo
Tangaro, Sabina
Lahti, Leo
Temko, Andriy
Claesson, Marcus J.
Berland, Magali
author_sort Papoutsoglou, Georgios
collection PubMed
description Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
format Online
Article
Text
id pubmed-10556866
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105568662023-10-07 Machine learning approaches in microbiome research: challenges and best practices Papoutsoglou, Georgios Tarazona, Sonia Lopes, Marta B. Klammsteiner, Thomas Ibrahimi, Eliana Eckenberger, Julia Novielli, Pierfrancesco Tonda, Alberto Simeon, Andrea Shigdel, Rajesh Béreux, Stéphane Vitali, Giacomo Tangaro, Sabina Lahti, Leo Temko, Andriy Claesson, Marcus J. Berland, Magali Front Microbiol Microbiology Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications. Frontiers Media S.A. 2023-09-22 /pmc/articles/PMC10556866/ /pubmed/37808286 http://dx.doi.org/10.3389/fmicb.2023.1261889 Text en Copyright © 2023 Papoutsoglou, Tarazona, Lopes, Klammsteiner, Ibrahimi, Eckenberger, Novielli, Tonda, Simeon, Shigdel, Béreux, Vitali, Tangaro, Lahti, Temko, Claesson and Berland. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Papoutsoglou, Georgios
Tarazona, Sonia
Lopes, Marta B.
Klammsteiner, Thomas
Ibrahimi, Eliana
Eckenberger, Julia
Novielli, Pierfrancesco
Tonda, Alberto
Simeon, Andrea
Shigdel, Rajesh
Béreux, Stéphane
Vitali, Giacomo
Tangaro, Sabina
Lahti, Leo
Temko, Andriy
Claesson, Marcus J.
Berland, Magali
Machine learning approaches in microbiome research: challenges and best practices
title Machine learning approaches in microbiome research: challenges and best practices
title_full Machine learning approaches in microbiome research: challenges and best practices
title_fullStr Machine learning approaches in microbiome research: challenges and best practices
title_full_unstemmed Machine learning approaches in microbiome research: challenges and best practices
title_short Machine learning approaches in microbiome research: challenges and best practices
title_sort machine learning approaches in microbiome research: challenges and best practices
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556866/
https://www.ncbi.nlm.nih.gov/pubmed/37808286
http://dx.doi.org/10.3389/fmicb.2023.1261889
work_keys_str_mv AT papoutsoglougeorgios machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT tarazonasonia machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT lopesmartab machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT klammsteinerthomas machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT ibrahimieliana machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT eckenbergerjulia machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT noviellipierfrancesco machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT tondaalberto machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT simeonandrea machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT shigdelrajesh machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT bereuxstephane machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT vitaligiacomo machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT tangarosabina machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT lahtileo machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT temkoandriy machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT claessonmarcusj machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices
AT berlandmagali machinelearningapproachesinmicrobiomeresearchchallengesandbestpractices