Cargando…

Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods

Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discover...

Descripción completa

Detalles Bibliográficos
Autores principales: Bakir-Gungor, Burcu, Bulut, Osman, Jabeer, Amhar, Nalbantoglu, O. Ufuk, Yousef, Malik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424122/
https://www.ncbi.nlm.nih.gov/pubmed/34512559
http://dx.doi.org/10.3389/fmicb.2021.628426
_version_ 1783749607237353472
author Bakir-Gungor, Burcu
Bulut, Osman
Jabeer, Amhar
Nalbantoglu, O. Ufuk
Yousef, Malik
author_facet Bakir-Gungor, Burcu
Bulut, Osman
Jabeer, Amhar
Nalbantoglu, O. Ufuk
Yousef, Malik
author_sort Bakir-Gungor, Burcu
collection PubMed
description Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.
format Online
Article
Text
id pubmed-8424122
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-84241222021-09-09 Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods Bakir-Gungor, Burcu Bulut, Osman Jabeer, Amhar Nalbantoglu, O. Ufuk Yousef, Malik Front Microbiol Microbiology Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods. Frontiers Media S.A. 2021-08-25 /pmc/articles/PMC8424122/ /pubmed/34512559 http://dx.doi.org/10.3389/fmicb.2021.628426 Text en Copyright © 2021 Bakir-Gungor, Bulut, Jabeer, Nalbantoglu and Yousef. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Bakir-Gungor, Burcu
Bulut, Osman
Jabeer, Amhar
Nalbantoglu, O. Ufuk
Yousef, Malik
Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
title Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
title_full Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
title_fullStr Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
title_full_unstemmed Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
title_short Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods
title_sort discovering potential taxonomic biomarkers of type 2 diabetes from human gut microbiota via different feature selection methods
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424122/
https://www.ncbi.nlm.nih.gov/pubmed/34512559
http://dx.doi.org/10.3389/fmicb.2021.628426
work_keys_str_mv AT bakirgungorburcu discoveringpotentialtaxonomicbiomarkersoftype2diabetesfromhumangutmicrobiotaviadifferentfeatureselectionmethods
AT bulutosman discoveringpotentialtaxonomicbiomarkersoftype2diabetesfromhumangutmicrobiotaviadifferentfeatureselectionmethods
AT jabeeramhar discoveringpotentialtaxonomicbiomarkersoftype2diabetesfromhumangutmicrobiotaviadifferentfeatureselectionmethods
AT nalbantogluoufuk discoveringpotentialtaxonomicbiomarkersoftype2diabetesfromhumangutmicrobiotaviadifferentfeatureselectionmethods
AT yousefmalik discoveringpotentialtaxonomicbiomarkersoftype2diabetesfromhumangutmicrobiotaviadifferentfeatureselectionmethods