Cargando…

Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets

OBJECTIVES: To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resour...

Descripción completa

Detalles Bibliográficos
Autores principales: Nghiem, Nhung, Atkinson, June, Nguyen, Binh P., Tran-Duy, An, Wilson, Nick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9898915/
https://www.ncbi.nlm.nih.gov/pubmed/36738348
http://dx.doi.org/10.1186/s13561-023-00422-1
_version_ 1784882532208082944
author Nghiem, Nhung
Atkinson, June
Nguyen, Binh P.
Tran-Duy, An
Wilson, Nick
author_facet Nghiem, Nhung
Atkinson, June
Nguyen, Binh P.
Tran-Duy, An
Wilson, Nick
author_sort Nghiem, Nhung
collection PubMed
description OBJECTIVES: To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardiovascular disease (CVD). METHODS: We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classification trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. RESULTS: The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learning models ranged from 30.6% to 41.2% (compared with 8.6–9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security benefit were among the most important predictors of the CVD high-cost users. CONCLUSIONS: This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identification of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve population health while potentially saving healthcare costs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13561-023-00422-1.
format Online
Article
Text
id pubmed-9898915
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-98989152023-02-05 Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets Nghiem, Nhung Atkinson, June Nguyen, Binh P. Tran-Duy, An Wilson, Nick Health Econ Rev Research OBJECTIVES: To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardiovascular disease (CVD). METHODS: We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classification trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. RESULTS: The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learning models ranged from 30.6% to 41.2% (compared with 8.6–9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security benefit were among the most important predictors of the CVD high-cost users. CONCLUSIONS: This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identification of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve population health while potentially saving healthcare costs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13561-023-00422-1. Springer Berlin Heidelberg 2023-02-04 /pmc/articles/PMC9898915/ /pubmed/36738348 http://dx.doi.org/10.1186/s13561-023-00422-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Nghiem, Nhung
Atkinson, June
Nguyen, Binh P.
Tran-Duy, An
Wilson, Nick
Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
title Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
title_full Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
title_fullStr Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
title_full_unstemmed Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
title_short Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
title_sort predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9898915/
https://www.ncbi.nlm.nih.gov/pubmed/36738348
http://dx.doi.org/10.1186/s13561-023-00422-1
work_keys_str_mv AT nghiemnhung predictinghighhealthcostusersamongpeoplewithcardiovasculardiseaseusingmachinelearningandnationwidelinkedsocialadministrativedatasets
AT atkinsonjune predictinghighhealthcostusersamongpeoplewithcardiovasculardiseaseusingmachinelearningandnationwidelinkedsocialadministrativedatasets
AT nguyenbinhp predictinghighhealthcostusersamongpeoplewithcardiovasculardiseaseusingmachinelearningandnationwidelinkedsocialadministrativedatasets
AT tranduyan predictinghighhealthcostusersamongpeoplewithcardiovasculardiseaseusingmachinelearningandnationwidelinkedsocialadministrativedatasets
AT wilsonnick predictinghighhealthcostusersamongpeoplewithcardiovasculardiseaseusingmachinelearningandnationwidelinkedsocialadministrativedatasets