Cargando…

The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data

Our aim was to predict future high-cost patients with machine learning using healthcare claims data. We applied a random forest (RF), a gradient boosting machine (GBM), an artificial neural network (ANN) and a logistic regression (LR) to predict high-cost patients in the following year. Therefore, w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Langenberger, Benedikt, Schulte, Timo, Groene, Oliver
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847900/ https://www.ncbi.nlm.nih.gov/pubmed/36652450 http://dx.doi.org/10.1371/journal.pone.0279540

_version_	1784871575698276352
author	Langenberger, Benedikt Schulte, Timo Groene, Oliver
author_facet	Langenberger, Benedikt Schulte, Timo Groene, Oliver
author_sort	Langenberger, Benedikt
collection	PubMed
description	Our aim was to predict future high-cost patients with machine learning using healthcare claims data. We applied a random forest (RF), a gradient boosting machine (GBM), an artificial neural network (ANN) and a logistic regression (LR) to predict high-cost patients in the following year. Therefore, we exploited routinely collected sickness funds claims and cost data of the years 2016, 2017 and 2018. Various specifications of each algorithm were trained and cross-validated on training data (n = 20,984) with claims and cost data from 2016 and outcomes from 2017. The best performing specifications of each algorithm were selected based on validation dataset performance. For performance comparison, selected models were applied to unforeseen data with features of the year 2017 and outcomes of the year 2018 (n = 21,146). The RF was the best performing algorithm measured by the area under the receiver operating curve (AUC) with a value of 0.883 (95% confidence interval (CI): 0.872–0.893) on test data, followed by the GBM (AUC = 0.878; 95% CI: 0.867–0.889). The ANN (AUC = 0.846; 95% CI: 0.834–0.857) and LR (AUC = 0.839; 95% CI: 0.826–0.852) were significantly outperformed by the GBM and the RF. All ML algorithms and the LR performed ´good´ (i.e. 0.9 > AUC ≥ 0.8). We were able to develop machine learning models that predict high-cost patients with ‘good’ performance facilitating routinely collected sickness fund claims and cost data. We found that tree-based models performed best and outperformed the ANN and LR.
format	Online Article Text
id	pubmed-9847900
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-98479002023-01-19 The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data Langenberger, Benedikt Schulte, Timo Groene, Oliver PLoS One Research Article Our aim was to predict future high-cost patients with machine learning using healthcare claims data. We applied a random forest (RF), a gradient boosting machine (GBM), an artificial neural network (ANN) and a logistic regression (LR) to predict high-cost patients in the following year. Therefore, we exploited routinely collected sickness funds claims and cost data of the years 2016, 2017 and 2018. Various specifications of each algorithm were trained and cross-validated on training data (n = 20,984) with claims and cost data from 2016 and outcomes from 2017. The best performing specifications of each algorithm were selected based on validation dataset performance. For performance comparison, selected models were applied to unforeseen data with features of the year 2017 and outcomes of the year 2018 (n = 21,146). The RF was the best performing algorithm measured by the area under the receiver operating curve (AUC) with a value of 0.883 (95% confidence interval (CI): 0.872–0.893) on test data, followed by the GBM (AUC = 0.878; 95% CI: 0.867–0.889). The ANN (AUC = 0.846; 95% CI: 0.834–0.857) and LR (AUC = 0.839; 95% CI: 0.826–0.852) were significantly outperformed by the GBM and the RF. All ML algorithms and the LR performed ´good´ (i.e. 0.9 > AUC ≥ 0.8). We were able to develop machine learning models that predict high-cost patients with ‘good’ performance facilitating routinely collected sickness fund claims and cost data. We found that tree-based models performed best and outperformed the ANN and LR. Public Library of Science 2023-01-18 /pmc/articles/PMC9847900/ /pubmed/36652450 http://dx.doi.org/10.1371/journal.pone.0279540 Text en © 2023 Langenberger et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Langenberger, Benedikt Schulte, Timo Groene, Oliver The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data
title	The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data
title_full	The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data
title_fullStr	The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data
title_full_unstemmed	The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data
title_short	The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data
title_sort	application of machine learning to predict high-cost patients: a performance-comparison of different models using healthcare claims data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847900/ https://www.ncbi.nlm.nih.gov/pubmed/36652450 http://dx.doi.org/10.1371/journal.pone.0279540
work_keys_str_mv	AT langenbergerbenedikt theapplicationofmachinelearningtopredicthighcostpatientsaperformancecomparisonofdifferentmodelsusinghealthcareclaimsdata AT schultetimo theapplicationofmachinelearningtopredicthighcostpatientsaperformancecomparisonofdifferentmodelsusinghealthcareclaimsdata AT groeneoliver theapplicationofmachinelearningtopredicthighcostpatientsaperformancecomparisonofdifferentmodelsusinghealthcareclaimsdata AT langenbergerbenedikt applicationofmachinelearningtopredicthighcostpatientsaperformancecomparisonofdifferentmodelsusinghealthcareclaimsdata AT schultetimo applicationofmachinelearningtopredicthighcostpatientsaperformancecomparisonofdifferentmodelsusinghealthcareclaimsdata AT groeneoliver applicationofmachinelearningtopredicthighcostpatientsaperformancecomparisonofdifferentmodelsusinghealthcareclaimsdata

The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data

Ejemplares similares