Cargando…

Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model

BACKGROUND: To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. METHODS: A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen...

Descripción completa

Detalles Bibliográficos
Autores principales: Su, Xi, Xu, Yongyong, Tan, Zhijun, Wang, Xia, Yang, Peng, Su, Yani, Jiang, Yangyang, Qin, Sijia, Shang, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7521325/
https://www.ncbi.nlm.nih.gov/pubmed/32725839
http://dx.doi.org/10.1002/jcla.23421
_version_ 1783587955584008192
author Su, Xi
Xu, Yongyong
Tan, Zhijun
Wang, Xia
Yang, Peng
Su, Yani
Jiang, Yangyang
Qin, Sijia
Shang, Lei
author_facet Su, Xi
Xu, Yongyong
Tan, Zhijun
Wang, Xia
Yang, Peng
Su, Yani
Jiang, Yangyang
Qin, Sijia
Shang, Lei
author_sort Su, Xi
collection PubMed
description BACKGROUND: To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. METHODS: A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen out the variables that greatly affected the CVD prediction and to establish a prediction model. The important variables were included in the multifactorial logistic regression analysis. The area under the curve (AUC) was compared between logistic regression model and random forest model. RESULTS: The random forest model revealed the variables, including the age, body mass index (BMI), fasting blood glucose (FBG), diastolic blood pressure (DBP), triglyceride (TG), systolic blood pressure (SBP), total cholesterol (TC), waist circumference, and high‐density lipoprotein‐cholesterol (HDL‐C), were more significant for CVD prediction; the AUC was 0.802 in CVD prediction. Multifactorial logistic regression analysis indicated that the risk factors for CVD included the age [odds ratio (OR): 1.14, 95% confidence intervals (CI): 1.10‐1.17, P < .001], BMI (OR: 1.13, 95% CI: 1.06‐1.20, P < .001), TG (OR: 1.11, 95% CI: 1.02‐1.22, P = .023), and DBP (OR: 1.04, 95% CI: 1.02‐1.06, P = .001); the AUC was 0.843 in CVD prediction. The established logistic regression prediction model was Logit P = Log[P/(1 − P)] = −11.47 + 0.13 × age + 0.12 × BMI + 0.11 × TG + 0.04 × DBP; P = 1/[1 + exp(−Logit P)]. People were prone to develop CVD at the time of P > .51. CONCLUSIONS: A prediction model for CVD is developed in the general population based on random forests, which provides a simple tool for the early prediction of CVD.
format Online
Article
Text
id pubmed-7521325
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-75213252020-10-02 Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model Su, Xi Xu, Yongyong Tan, Zhijun Wang, Xia Yang, Peng Su, Yani Jiang, Yangyang Qin, Sijia Shang, Lei J Clin Lab Anal Research Articles BACKGROUND: To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. METHODS: A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen out the variables that greatly affected the CVD prediction and to establish a prediction model. The important variables were included in the multifactorial logistic regression analysis. The area under the curve (AUC) was compared between logistic regression model and random forest model. RESULTS: The random forest model revealed the variables, including the age, body mass index (BMI), fasting blood glucose (FBG), diastolic blood pressure (DBP), triglyceride (TG), systolic blood pressure (SBP), total cholesterol (TC), waist circumference, and high‐density lipoprotein‐cholesterol (HDL‐C), were more significant for CVD prediction; the AUC was 0.802 in CVD prediction. Multifactorial logistic regression analysis indicated that the risk factors for CVD included the age [odds ratio (OR): 1.14, 95% confidence intervals (CI): 1.10‐1.17, P < .001], BMI (OR: 1.13, 95% CI: 1.06‐1.20, P < .001), TG (OR: 1.11, 95% CI: 1.02‐1.22, P = .023), and DBP (OR: 1.04, 95% CI: 1.02‐1.06, P = .001); the AUC was 0.843 in CVD prediction. The established logistic regression prediction model was Logit P = Log[P/(1 − P)] = −11.47 + 0.13 × age + 0.12 × BMI + 0.11 × TG + 0.04 × DBP; P = 1/[1 + exp(−Logit P)]. People were prone to develop CVD at the time of P > .51. CONCLUSIONS: A prediction model for CVD is developed in the general population based on random forests, which provides a simple tool for the early prediction of CVD. John Wiley and Sons Inc. 2020-07-29 /pmc/articles/PMC7521325/ /pubmed/32725839 http://dx.doi.org/10.1002/jcla.23421 Text en © 2020 The Authors. Journal of Clinical Laboratory Analysis published by Wiley Periodicals LLC This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Research Articles
Su, Xi
Xu, Yongyong
Tan, Zhijun
Wang, Xia
Yang, Peng
Su, Yani
Jiang, Yangyang
Qin, Sijia
Shang, Lei
Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model
title Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model
title_full Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model
title_fullStr Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model
title_full_unstemmed Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model
title_short Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model
title_sort prediction for cardiovascular diseases based on laboratory data: an analysis of random forest model
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7521325/
https://www.ncbi.nlm.nih.gov/pubmed/32725839
http://dx.doi.org/10.1002/jcla.23421
work_keys_str_mv AT suxi predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT xuyongyong predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT tanzhijun predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT wangxia predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT yangpeng predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT suyani predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT jiangyangyang predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT qinsijia predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel
AT shanglei predictionforcardiovasculardiseasesbasedonlaboratorydataananalysisofrandomforestmodel