Cargando…

Diabetes classification model based on boosting algorithms

BACKGROUND: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses. RESULTS: Non-parametric statistical testing is performed on h...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Peihua, Pan, Chuandi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872396/
https://www.ncbi.nlm.nih.gov/pubmed/29587624
http://dx.doi.org/10.1186/s12859-018-2090-9
_version_ 1783309827497263104
author Chen, Peihua
Pan, Chuandi
author_facet Chen, Peihua
Pan, Chuandi
author_sort Chen, Peihua
collection PubMed
description BACKGROUND: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses. RESULTS: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99. CONCLUSIONS: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus.
format Online
Article
Text
id pubmed-5872396
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58723962018-04-02 Diabetes classification model based on boosting algorithms Chen, Peihua Pan, Chuandi BMC Bioinformatics Research BACKGROUND: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses. RESULTS: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99. CONCLUSIONS: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus. BioMed Central 2018-03-27 /pmc/articles/PMC5872396/ /pubmed/29587624 http://dx.doi.org/10.1186/s12859-018-2090-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chen, Peihua
Pan, Chuandi
Diabetes classification model based on boosting algorithms
title Diabetes classification model based on boosting algorithms
title_full Diabetes classification model based on boosting algorithms
title_fullStr Diabetes classification model based on boosting algorithms
title_full_unstemmed Diabetes classification model based on boosting algorithms
title_short Diabetes classification model based on boosting algorithms
title_sort diabetes classification model based on boosting algorithms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872396/
https://www.ncbi.nlm.nih.gov/pubmed/29587624
http://dx.doi.org/10.1186/s12859-018-2090-9
work_keys_str_mv AT chenpeihua diabetesclassificationmodelbasedonboostingalgorithms
AT panchuandi diabetesclassificationmodelbasedonboostingalgorithms