Cargando…

Using random forest algorithm for glomerular and tubular injury diagnosis

OBJECTIVES: Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using thr...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Wenzhu, Zhou, Xiaoshuang, Duan, Qi, Wang, Qian, Li, Yaheng, Li, Aizhong, Zhou, Wenjing, Sun, Lin, Qiu, Lixia, Li, Rongshan, Li, Yafeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9366016/
https://www.ncbi.nlm.nih.gov/pubmed/35966858
http://dx.doi.org/10.3389/fmed.2022.911737
_version_ 1784765463384817664
author Song, Wenzhu
Zhou, Xiaoshuang
Duan, Qi
Wang, Qian
Li, Yaheng
Li, Aizhong
Zhou, Wenjing
Sun, Lin
Qiu, Lixia
Li, Rongshan
Li, Yafeng
author_facet Song, Wenzhu
Zhou, Xiaoshuang
Duan, Qi
Wang, Qian
Li, Yaheng
Li, Aizhong
Zhou, Wenjing
Sun, Lin
Qiu, Lixia
Li, Rongshan
Li, Yafeng
author_sort Song, Wenzhu
collection PubMed
description OBJECTIVES: Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD. METHODS: Demographic information, physical examination, blood, and morning urine samples were first collected from 13,550 subjects in 10 counties in Shanxi province for classification of GI and TI. Besides, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i.e., GI and TI. Afterward, Random Forest (RF), Naive Bayes (NB), and logistic regression (LR) were constructed to achieve classification of GI and TI, respectively. RESULTS: A total of 12,330 participants enrolled in this study, with 20 explanatory variables. The number of patients with GI, and TI were 1,587 (12.8%) and 1,456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. Besides, after SMOTE, the number of patients and normal ones were 6,165, 6,165 for GI, and 6,165, 6,164 for TI, respectively. RF outperformed NB and LR in terms of accuracy (78.14, 80.49%), sensitivity (82.00, 84.60%), specificity (74.29, 76.09%), and AUC (0.868, 0.885) for both GI and TI; the four variables contributing most to the classification of GI and TI represented SBP, DBP, sex, age and age, SBP, FPG, and GHb, respectively. CONCLUSION: RF boasts good performance in classifying GI and TI, which allows for early auxiliary diagnosis of GI and TI, thus facilitating to help alleviate the progression of CKD, and enjoying great prospects in clinical practice.
format Online
Article
Text
id pubmed-9366016
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-93660162022-08-12 Using random forest algorithm for glomerular and tubular injury diagnosis Song, Wenzhu Zhou, Xiaoshuang Duan, Qi Wang, Qian Li, Yaheng Li, Aizhong Zhou, Wenjing Sun, Lin Qiu, Lixia Li, Rongshan Li, Yafeng Front Med (Lausanne) Medicine OBJECTIVES: Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD. METHODS: Demographic information, physical examination, blood, and morning urine samples were first collected from 13,550 subjects in 10 counties in Shanxi province for classification of GI and TI. Besides, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i.e., GI and TI. Afterward, Random Forest (RF), Naive Bayes (NB), and logistic regression (LR) were constructed to achieve classification of GI and TI, respectively. RESULTS: A total of 12,330 participants enrolled in this study, with 20 explanatory variables. The number of patients with GI, and TI were 1,587 (12.8%) and 1,456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. Besides, after SMOTE, the number of patients and normal ones were 6,165, 6,165 for GI, and 6,165, 6,164 for TI, respectively. RF outperformed NB and LR in terms of accuracy (78.14, 80.49%), sensitivity (82.00, 84.60%), specificity (74.29, 76.09%), and AUC (0.868, 0.885) for both GI and TI; the four variables contributing most to the classification of GI and TI represented SBP, DBP, sex, age and age, SBP, FPG, and GHb, respectively. CONCLUSION: RF boasts good performance in classifying GI and TI, which allows for early auxiliary diagnosis of GI and TI, thus facilitating to help alleviate the progression of CKD, and enjoying great prospects in clinical practice. Frontiers Media S.A. 2022-07-28 /pmc/articles/PMC9366016/ /pubmed/35966858 http://dx.doi.org/10.3389/fmed.2022.911737 Text en Copyright © 2022 Song, Zhou, Duan, Wang, Li, Li, Zhou, Sun, Qiu, Li and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Medicine
Song, Wenzhu
Zhou, Xiaoshuang
Duan, Qi
Wang, Qian
Li, Yaheng
Li, Aizhong
Zhou, Wenjing
Sun, Lin
Qiu, Lixia
Li, Rongshan
Li, Yafeng
Using random forest algorithm for glomerular and tubular injury diagnosis
title Using random forest algorithm for glomerular and tubular injury diagnosis
title_full Using random forest algorithm for glomerular and tubular injury diagnosis
title_fullStr Using random forest algorithm for glomerular and tubular injury diagnosis
title_full_unstemmed Using random forest algorithm for glomerular and tubular injury diagnosis
title_short Using random forest algorithm for glomerular and tubular injury diagnosis
title_sort using random forest algorithm for glomerular and tubular injury diagnosis
topic Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9366016/
https://www.ncbi.nlm.nih.gov/pubmed/35966858
http://dx.doi.org/10.3389/fmed.2022.911737
work_keys_str_mv AT songwenzhu usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT zhouxiaoshuang usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT duanqi usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT wangqian usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT liyaheng usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT liaizhong usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT zhouwenjing usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT sunlin usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT qiulixia usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT lirongshan usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis
AT liyafeng usingrandomforestalgorithmforglomerularandtubularinjurydiagnosis