Cargando…

Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease

Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kubinski, Ryszard, Djamen-Kepaou, Jean-Yves, Zhanabaev, Timur, Hernandez-Garcia, Alex, Bauer, Stefan, Hildebrand, Falk, Korcsmaros, Tamas, Karam, Sani, Jantchou, Prévost, Kafi, Kamran, Martin, Ryan D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8895431/ https://www.ncbi.nlm.nih.gov/pubmed/35251123 http://dx.doi.org/10.3389/fgene.2022.784397

_version_	1784662921840689152
author	Kubinski, Ryszard Djamen-Kepaou, Jean-Yves Zhanabaev, Timur Hernandez-Garcia, Alex Bauer, Stefan Hildebrand, Falk Korcsmaros, Tamas Karam, Sani Jantchou, Prévost Kafi, Kamran Martin, Ryan D.
author_facet	Kubinski, Ryszard Djamen-Kepaou, Jean-Yves Zhanabaev, Timur Hernandez-Garcia, Alex Bauer, Stefan Hildebrand, Falk Korcsmaros, Tamas Karam, Sani Jantchou, Prévost Kafi, Kamran Martin, Ryan D.
author_sort	Kubinski, Ryszard
collection	PubMed
description	Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome’s composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
format	Online Article Text
id	pubmed-8895431
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-88954312022-03-05 Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease Kubinski, Ryszard Djamen-Kepaou, Jean-Yves Zhanabaev, Timur Hernandez-Garcia, Alex Bauer, Stefan Hildebrand, Falk Korcsmaros, Tamas Karam, Sani Jantchou, Prévost Kafi, Kamran Martin, Ryan D. Front Genet Genetics Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome’s composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD. Frontiers Media S.A. 2022-02-14 /pmc/articles/PMC8895431/ /pubmed/35251123 http://dx.doi.org/10.3389/fgene.2022.784397 Text en Copyright © 2022 Kubinski, Djamen-Kepaou, Zhanabaev, Hernandez-Garcia, Bauer, Hildebrand, Korcsmaros, Karam, Jantchou, Kafi and Martin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Kubinski, Ryszard Djamen-Kepaou, Jean-Yves Zhanabaev, Timur Hernandez-Garcia, Alex Bauer, Stefan Hildebrand, Falk Korcsmaros, Tamas Karam, Sani Jantchou, Prévost Kafi, Kamran Martin, Ryan D. Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
title	Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
title_full	Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
title_fullStr	Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
title_full_unstemmed	Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
title_short	Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
title_sort	benchmark of data processing methods and machine learning models for gut microbiome-based diagnosis of inflammatory bowel disease
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8895431/ https://www.ncbi.nlm.nih.gov/pubmed/35251123 http://dx.doi.org/10.3389/fgene.2022.784397
work_keys_str_mv	AT kubinskiryszard benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT djamenkepaoujeanyves benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT zhanabaevtimur benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT hernandezgarciaalex benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT bauerstefan benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT hildebrandfalk benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT korcsmarostamas benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT karamsani benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT jantchouprevost benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT kafikamran benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease AT martinryand benchmarkofdataprocessingmethodsandmachinelearningmodelsforgutmicrobiomebaseddiagnosisofinflammatoryboweldisease

Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease

Ejemplares similares