Cargando…

Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features

Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous o...

Descripción completa

Detalles Bibliográficos
Autores principales: Morgan-Benita, Jorge A., Galván-Tejada, Carlos E., Cruz, Miguel, Galván-Tejada, Jorge I., Gamboa-Rosales, Hamurabi, Arceo-Olague, Jose G., Luna-García, Huizilopoztli, Celaya-Padilla, José M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9331873/
https://www.ncbi.nlm.nih.gov/pubmed/35893185
http://dx.doi.org/10.3390/healthcare10081362
_version_ 1784758508604882944
author Morgan-Benita, Jorge A.
Galván-Tejada, Carlos E.
Cruz, Miguel
Galván-Tejada, Jorge I.
Gamboa-Rosales, Hamurabi
Arceo-Olague, Jose G.
Luna-García, Huizilopoztli
Celaya-Padilla, José M.
author_facet Morgan-Benita, Jorge A.
Galván-Tejada, Carlos E.
Cruz, Miguel
Galván-Tejada, Jorge I.
Gamboa-Rosales, Hamurabi
Arceo-Olague, Jose G.
Luna-García, Huizilopoztli
Celaya-Padilla, José M.
author_sort Morgan-Benita, Jorge A.
collection PubMed
description Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%.
format Online
Article
Text
id pubmed-9331873
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93318732022-07-29 Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features Morgan-Benita, Jorge A. Galván-Tejada, Carlos E. Cruz, Miguel Galván-Tejada, Jorge I. Gamboa-Rosales, Hamurabi Arceo-Olague, Jose G. Luna-García, Huizilopoztli Celaya-Padilla, José M. Healthcare (Basel) Article Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%. MDPI 2022-07-22 /pmc/articles/PMC9331873/ /pubmed/35893185 http://dx.doi.org/10.3390/healthcare10081362 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Morgan-Benita, Jorge A.
Galván-Tejada, Carlos E.
Cruz, Miguel
Galván-Tejada, Jorge I.
Gamboa-Rosales, Hamurabi
Arceo-Olague, Jose G.
Luna-García, Huizilopoztli
Celaya-Padilla, José M.
Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
title Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
title_full Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
title_fullStr Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
title_full_unstemmed Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
title_short Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
title_sort hard voting ensemble approach for the detection of type 2 diabetes in mexican population with non-glucose related features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9331873/
https://www.ncbi.nlm.nih.gov/pubmed/35893185
http://dx.doi.org/10.3390/healthcare10081362
work_keys_str_mv AT morganbenitajorgea hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT galvantejadacarlose hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT cruzmiguel hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT galvantejadajorgei hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT gamboarosaleshamurabi hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT arceoolaguejoseg hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT lunagarciahuizilopoztli hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures
AT celayapadillajosem hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures