Cargando…

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database

Background: This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets. Methods: We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the internationa...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Joung Ouk (Ryan), Jeong, Yong-Suk, Kim, Jin Ho, Lee, Jong-Weon, Park, Dougho, Kim, Hyoung-Seop
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8229422/
https://www.ncbi.nlm.nih.gov/pubmed/34070504
http://dx.doi.org/10.3390/diagnostics11060943
_version_ 1783712972928974848
author Kim, Joung Ouk (Ryan)
Jeong, Yong-Suk
Kim, Jin Ho
Lee, Jong-Weon
Park, Dougho
Kim, Hyoung-Seop
author_facet Kim, Joung Ouk (Ryan)
Jeong, Yong-Suk
Kim, Jin Ho
Lee, Jong-Weon
Park, Dougho
Kim, Hyoung-Seop
author_sort Kim, Joung Ouk (Ryan)
collection PubMed
description Background: This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets. Methods: We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20–I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared. Results: The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index. Conclusions: Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models.
format Online
Article
Text
id pubmed-8229422
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-82294222021-06-26 Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database Kim, Joung Ouk (Ryan) Jeong, Yong-Suk Kim, Jin Ho Lee, Jong-Weon Park, Dougho Kim, Hyoung-Seop Diagnostics (Basel) Article Background: This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets. Methods: We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20–I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared. Results: The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index. Conclusions: Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models. MDPI 2021-05-25 /pmc/articles/PMC8229422/ /pubmed/34070504 http://dx.doi.org/10.3390/diagnostics11060943 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Joung Ouk (Ryan)
Jeong, Yong-Suk
Kim, Jin Ho
Lee, Jong-Weon
Park, Dougho
Kim, Hyoung-Seop
Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database
title Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database
title_full Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database
title_fullStr Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database
title_full_unstemmed Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database
title_short Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database
title_sort machine learning-based cardiovascular disease prediction model: a cohort study on the korean national health insurance service health screening database
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8229422/
https://www.ncbi.nlm.nih.gov/pubmed/34070504
http://dx.doi.org/10.3390/diagnostics11060943
work_keys_str_mv AT kimjoungoukryan machinelearningbasedcardiovasculardiseasepredictionmodelacohortstudyonthekoreannationalhealthinsuranceservicehealthscreeningdatabase
AT jeongyongsuk machinelearningbasedcardiovasculardiseasepredictionmodelacohortstudyonthekoreannationalhealthinsuranceservicehealthscreeningdatabase
AT kimjinho machinelearningbasedcardiovasculardiseasepredictionmodelacohortstudyonthekoreannationalhealthinsuranceservicehealthscreeningdatabase
AT leejongweon machinelearningbasedcardiovasculardiseasepredictionmodelacohortstudyonthekoreannationalhealthinsuranceservicehealthscreeningdatabase
AT parkdougho machinelearningbasedcardiovasculardiseasepredictionmodelacohortstudyonthekoreannationalhealthinsuranceservicehealthscreeningdatabase
AT kimhyoungseop machinelearningbasedcardiovasculardiseasepredictionmodelacohortstudyonthekoreannationalhealthinsuranceservicehealthscreeningdatabase