Cargando…

Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea

In this study, socioeconomic, medical treatment, and health check-up data from 2010 to 2017 of the National Health Insurance Service (NHIS) of Korea were analyzed. This year’s socioeconomic, treatment, and health check-up data are used to develop a predictive model for high medical expenses in the n...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Yeongah, An, Jiho, Ryu, Seiyoung, Kim, Jaekyeong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9603723/
https://www.ncbi.nlm.nih.gov/pubmed/36294248
http://dx.doi.org/10.3390/ijerph192013672
_version_ 1784817627176108032
author Choi, Yeongah
An, Jiho
Ryu, Seiyoung
Kim, Jaekyeong
author_facet Choi, Yeongah
An, Jiho
Ryu, Seiyoung
Kim, Jaekyeong
author_sort Choi, Yeongah
collection PubMed
description In this study, socioeconomic, medical treatment, and health check-up data from 2010 to 2017 of the National Health Insurance Service (NHIS) of Korea were analyzed. This year’s socioeconomic, treatment, and health check-up data are used to develop a predictive model for high medical expenses in the next year. The characteristic of this study is to derive important variables related to the high cost of domestic medical expenses users by using data on health check-up items conducted by the country. In this study, we tried to classify data and evaluate its performance using classification supervised learning algorithms for high-cost medical expense prediction. Supervised learning for predicting high-cost medical expenses was performed using the logistic regression model, random forest, and XGBoost, which have been known to result the best performance and explanatory power among the machine learning algorithms used in previous studies. Our experimental results show that the XGBoost model had the best performance with 77.1% accuracy. The contribution of this study is to identify the variables that affect the prediction of high-cost medical expenses by analyzing the medical bills using the health check-up variables and the Korea Classification Disease (KCD) large group as input variables. Through this study, it was confirmed that musculoskeletal disorders (M) and respiratory diseases (J), which are the most frequently treated diseases, as important KCD disease groups for high-cost prediction in Korea, affect the future high cost prediction. In addition, it was confirmed that malignant neoplasia diseases (C) with high medical cost per treatment are a group of diseases related to high future medical cost prediction. Unlike previous studies, it is the result of analyzing all disease data, so it is expected that the study will be more meaningful when compared with the results of other national health check-up data.
format Online
Article
Text
id pubmed-9603723
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96037232022-10-27 Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea Choi, Yeongah An, Jiho Ryu, Seiyoung Kim, Jaekyeong Int J Environ Res Public Health Article In this study, socioeconomic, medical treatment, and health check-up data from 2010 to 2017 of the National Health Insurance Service (NHIS) of Korea were analyzed. This year’s socioeconomic, treatment, and health check-up data are used to develop a predictive model for high medical expenses in the next year. The characteristic of this study is to derive important variables related to the high cost of domestic medical expenses users by using data on health check-up items conducted by the country. In this study, we tried to classify data and evaluate its performance using classification supervised learning algorithms for high-cost medical expense prediction. Supervised learning for predicting high-cost medical expenses was performed using the logistic regression model, random forest, and XGBoost, which have been known to result the best performance and explanatory power among the machine learning algorithms used in previous studies. Our experimental results show that the XGBoost model had the best performance with 77.1% accuracy. The contribution of this study is to identify the variables that affect the prediction of high-cost medical expenses by analyzing the medical bills using the health check-up variables and the Korea Classification Disease (KCD) large group as input variables. Through this study, it was confirmed that musculoskeletal disorders (M) and respiratory diseases (J), which are the most frequently treated diseases, as important KCD disease groups for high-cost prediction in Korea, affect the future high cost prediction. In addition, it was confirmed that malignant neoplasia diseases (C) with high medical cost per treatment are a group of diseases related to high future medical cost prediction. Unlike previous studies, it is the result of analyzing all disease data, so it is expected that the study will be more meaningful when compared with the results of other national health check-up data. MDPI 2022-10-21 /pmc/articles/PMC9603723/ /pubmed/36294248 http://dx.doi.org/10.3390/ijerph192013672 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Choi, Yeongah
An, Jiho
Ryu, Seiyoung
Kim, Jaekyeong
Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea
title Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea
title_full Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea
title_fullStr Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea
title_full_unstemmed Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea
title_short Development and Evaluation of Machine Learning-Based High-Cost Prediction Model Using Health Check-Up Data by the National Health Insurance Service of Korea
title_sort development and evaluation of machine learning-based high-cost prediction model using health check-up data by the national health insurance service of korea
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9603723/
https://www.ncbi.nlm.nih.gov/pubmed/36294248
http://dx.doi.org/10.3390/ijerph192013672
work_keys_str_mv AT choiyeongah developmentandevaluationofmachinelearningbasedhighcostpredictionmodelusinghealthcheckupdatabythenationalhealthinsuranceserviceofkorea
AT anjiho developmentandevaluationofmachinelearningbasedhighcostpredictionmodelusinghealthcheckupdatabythenationalhealthinsuranceserviceofkorea
AT ryuseiyoung developmentandevaluationofmachinelearningbasedhighcostpredictionmodelusinghealthcheckupdatabythenationalhealthinsuranceserviceofkorea
AT kimjaekyeong developmentandevaluationofmachinelearningbasedhighcostpredictionmodelusinghealthcheckupdatabythenationalhealthinsuranceserviceofkorea