Cargando…
Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features
Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous o...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9331873/ https://www.ncbi.nlm.nih.gov/pubmed/35893185 http://dx.doi.org/10.3390/healthcare10081362 |
_version_ | 1784758508604882944 |
---|---|
author | Morgan-Benita, Jorge A. Galván-Tejada, Carlos E. Cruz, Miguel Galván-Tejada, Jorge I. Gamboa-Rosales, Hamurabi Arceo-Olague, Jose G. Luna-García, Huizilopoztli Celaya-Padilla, José M. |
author_facet | Morgan-Benita, Jorge A. Galván-Tejada, Carlos E. Cruz, Miguel Galván-Tejada, Jorge I. Gamboa-Rosales, Hamurabi Arceo-Olague, Jose G. Luna-García, Huizilopoztli Celaya-Padilla, José M. |
author_sort | Morgan-Benita, Jorge A. |
collection | PubMed |
description | Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%. |
format | Online Article Text |
id | pubmed-9331873 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93318732022-07-29 Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features Morgan-Benita, Jorge A. Galván-Tejada, Carlos E. Cruz, Miguel Galván-Tejada, Jorge I. Gamboa-Rosales, Hamurabi Arceo-Olague, Jose G. Luna-García, Huizilopoztli Celaya-Padilla, José M. Healthcare (Basel) Article Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%. MDPI 2022-07-22 /pmc/articles/PMC9331873/ /pubmed/35893185 http://dx.doi.org/10.3390/healthcare10081362 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Morgan-Benita, Jorge A. Galván-Tejada, Carlos E. Cruz, Miguel Galván-Tejada, Jorge I. Gamboa-Rosales, Hamurabi Arceo-Olague, Jose G. Luna-García, Huizilopoztli Celaya-Padilla, José M. Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features |
title | Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features |
title_full | Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features |
title_fullStr | Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features |
title_full_unstemmed | Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features |
title_short | Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features |
title_sort | hard voting ensemble approach for the detection of type 2 diabetes in mexican population with non-glucose related features |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9331873/ https://www.ncbi.nlm.nih.gov/pubmed/35893185 http://dx.doi.org/10.3390/healthcare10081362 |
work_keys_str_mv | AT morganbenitajorgea hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT galvantejadacarlose hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT cruzmiguel hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT galvantejadajorgei hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT gamboarosaleshamurabi hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT arceoolaguejoseg hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT lunagarciahuizilopoztli hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures AT celayapadillajosem hardvotingensembleapproachforthedetectionoftype2diabetesinmexicanpopulationwithnonglucoserelatedfeatures |