Cargando…
Application of multi-label classification models for the diagnosis of diabetic complications
BACKGROUND: Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electroni...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8182940/ https://www.ncbi.nlm.nih.gov/pubmed/34098959 http://dx.doi.org/10.1186/s12911-021-01525-7 |
_version_ | 1783704287719718912 |
---|---|
author | Zhou, Liang Zheng, Xiaoyuan Yang, Di Wang, Ying Bai, Xuesong Ye, Xinhua |
author_facet | Zhou, Liang Zheng, Xiaoyuan Yang, Di Wang, Ying Bai, Xuesong Ye, Xinhua |
author_sort | Zhou, Liang |
collection | PubMed |
description | BACKGROUND: Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy. METHODS: We obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications. RESULTS: The MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application. CONCLUSIONS: Our MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01525-7. |
format | Online Article Text |
id | pubmed-8182940 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81829402021-06-09 Application of multi-label classification models for the diagnosis of diabetic complications Zhou, Liang Zheng, Xiaoyuan Yang, Di Wang, Ying Bai, Xuesong Ye, Xinhua BMC Med Inform Decis Mak Research BACKGROUND: Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy. METHODS: We obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications. RESULTS: The MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application. CONCLUSIONS: Our MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01525-7. BioMed Central 2021-06-07 /pmc/articles/PMC8182940/ /pubmed/34098959 http://dx.doi.org/10.1186/s12911-021-01525-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhou, Liang Zheng, Xiaoyuan Yang, Di Wang, Ying Bai, Xuesong Ye, Xinhua Application of multi-label classification models for the diagnosis of diabetic complications |
title | Application of multi-label classification models for the diagnosis of diabetic complications |
title_full | Application of multi-label classification models for the diagnosis of diabetic complications |
title_fullStr | Application of multi-label classification models for the diagnosis of diabetic complications |
title_full_unstemmed | Application of multi-label classification models for the diagnosis of diabetic complications |
title_short | Application of multi-label classification models for the diagnosis of diabetic complications |
title_sort | application of multi-label classification models for the diagnosis of diabetic complications |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8182940/ https://www.ncbi.nlm.nih.gov/pubmed/34098959 http://dx.doi.org/10.1186/s12911-021-01525-7 |
work_keys_str_mv | AT zhouliang applicationofmultilabelclassificationmodelsforthediagnosisofdiabeticcomplications AT zhengxiaoyuan applicationofmultilabelclassificationmodelsforthediagnosisofdiabeticcomplications AT yangdi applicationofmultilabelclassificationmodelsforthediagnosisofdiabeticcomplications AT wangying applicationofmultilabelclassificationmodelsforthediagnosisofdiabeticcomplications AT baixuesong applicationofmultilabelclassificationmodelsforthediagnosisofdiabeticcomplications AT yexinhua applicationofmultilabelclassificationmodelsforthediagnosisofdiabeticcomplications |