Cargando…

Diabetes classification model based on boosting algorithms

BACKGROUND: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses. RESULTS: Non-parametric statistical testing is performed on h...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Peihua, Pan, Chuandi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872396/
https://www.ncbi.nlm.nih.gov/pubmed/29587624
http://dx.doi.org/10.1186/s12859-018-2090-9
Descripción
Sumario:BACKGROUND: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses. RESULTS: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99. CONCLUSIONS: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus.