Cargando…

Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data

BACKGROUND: The Oncology Care Model (OCM) was developed as a payment model to encourage participating practices to provide better-quality care for cancer patients at a lower cost. The risk-adjustment model used in OCM is a Gamma generalized linear model (Gamma GLM) with log-link. The predicted value...

Descripción completa

Detalles Bibliográficos
Autores principales: Mazumdar, Madhu, Lin, Jung-Yi Joyce, Zhang, Wei, Li, Lihua, Liu, Mark, Dharmarajan, Kavita, Sanderson, Mark, Isola, Luis, Hu, Liangyuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7183716/
https://www.ncbi.nlm.nih.gov/pubmed/32334595
http://dx.doi.org/10.1186/s12913-020-05148-y
_version_ 1783526478569275392
author Mazumdar, Madhu
Lin, Jung-Yi Joyce
Zhang, Wei
Li, Lihua
Liu, Mark
Dharmarajan, Kavita
Sanderson, Mark
Isola, Luis
Hu, Liangyuan
author_facet Mazumdar, Madhu
Lin, Jung-Yi Joyce
Zhang, Wei
Li, Lihua
Liu, Mark
Dharmarajan, Kavita
Sanderson, Mark
Isola, Luis
Hu, Liangyuan
author_sort Mazumdar, Madhu
collection PubMed
description BACKGROUND: The Oncology Care Model (OCM) was developed as a payment model to encourage participating practices to provide better-quality care for cancer patients at a lower cost. The risk-adjustment model used in OCM is a Gamma generalized linear model (Gamma GLM) with log-link. The predicted value of expense for the episodes identified for our academic medical center (AMC), based on the model fitted to the national data, did not correlate well with our observed expense. This motivated us to fit the Gamma GLM to our AMC data and compare it with two other flexible modeling methods: Random Forest (RF) and Partially Linear Additive Quantile Regression (PLAQR). We also performed a simulation study to assess comparative performance of these methods and examined the impact of non-linearity and interaction effects, two understudied aspects in the field of cost prediction. METHODS: The simulation was designed with an outcome of cost generated from four distributions: Gamma, Weibull, Log-normal with a heteroscedastic error term, and heavy-tailed. Simulation parameters both similar to and different from OCM data were considered. The performance metrics considered were the root mean square error (RMSE), mean absolute prediction error (MAPE), and cost accuracy (CA). Bootstrap resampling was utilized to estimate the operating characteristics of the performance metrics, which were described by boxplots. RESULTS: RF attained the best performance with lowest RMSE, MAPE, and highest CA for most of the scenarios. When the models were misspecified, their performance was further differentiated. Model performance differed more for non-exponential than exponential outcome distributions. CONCLUSIONS: RF outperformed Gamma GLM and PLAQR in predicting overall and top decile costs. RF demonstrated improved prediction under various scenarios common in healthcare cost modeling. Additionally, RF did not require prespecification of outcome distribution, nonlinearity effect, or interaction terms. Therefore, RF appears to be the best tool to predict average cost. However, when the goal is to estimate extreme expenses, e.g., high cost episodes, the accuracy gained by RF versus its computational costs may need to be considered.
format Online
Article
Text
id pubmed-7183716
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71837162020-04-30 Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data Mazumdar, Madhu Lin, Jung-Yi Joyce Zhang, Wei Li, Lihua Liu, Mark Dharmarajan, Kavita Sanderson, Mark Isola, Luis Hu, Liangyuan BMC Health Serv Res Research Article BACKGROUND: The Oncology Care Model (OCM) was developed as a payment model to encourage participating practices to provide better-quality care for cancer patients at a lower cost. The risk-adjustment model used in OCM is a Gamma generalized linear model (Gamma GLM) with log-link. The predicted value of expense for the episodes identified for our academic medical center (AMC), based on the model fitted to the national data, did not correlate well with our observed expense. This motivated us to fit the Gamma GLM to our AMC data and compare it with two other flexible modeling methods: Random Forest (RF) and Partially Linear Additive Quantile Regression (PLAQR). We also performed a simulation study to assess comparative performance of these methods and examined the impact of non-linearity and interaction effects, two understudied aspects in the field of cost prediction. METHODS: The simulation was designed with an outcome of cost generated from four distributions: Gamma, Weibull, Log-normal with a heteroscedastic error term, and heavy-tailed. Simulation parameters both similar to and different from OCM data were considered. The performance metrics considered were the root mean square error (RMSE), mean absolute prediction error (MAPE), and cost accuracy (CA). Bootstrap resampling was utilized to estimate the operating characteristics of the performance metrics, which were described by boxplots. RESULTS: RF attained the best performance with lowest RMSE, MAPE, and highest CA for most of the scenarios. When the models were misspecified, their performance was further differentiated. Model performance differed more for non-exponential than exponential outcome distributions. CONCLUSIONS: RF outperformed Gamma GLM and PLAQR in predicting overall and top decile costs. RF demonstrated improved prediction under various scenarios common in healthcare cost modeling. Additionally, RF did not require prespecification of outcome distribution, nonlinearity effect, or interaction terms. Therefore, RF appears to be the best tool to predict average cost. However, when the goal is to estimate extreme expenses, e.g., high cost episodes, the accuracy gained by RF versus its computational costs may need to be considered. BioMed Central 2020-04-25 /pmc/articles/PMC7183716/ /pubmed/32334595 http://dx.doi.org/10.1186/s12913-020-05148-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Mazumdar, Madhu
Lin, Jung-Yi Joyce
Zhang, Wei
Li, Lihua
Liu, Mark
Dharmarajan, Kavita
Sanderson, Mark
Isola, Luis
Hu, Liangyuan
Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data
title Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data
title_full Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data
title_fullStr Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data
title_full_unstemmed Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data
title_short Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data
title_sort comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by oncology care model (ocm) data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7183716/
https://www.ncbi.nlm.nih.gov/pubmed/32334595
http://dx.doi.org/10.1186/s12913-020-05148-y
work_keys_str_mv AT mazumdarmadhu comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT linjungyijoyce comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT zhangwei comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT lilihua comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT liumark comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT dharmarajankavita comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT sandersonmark comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT isolaluis comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata
AT huliangyuan comparisonofstatisticalandmachinelearningmodelsforhealthcarecostdataasimulationstudymotivatedbyoncologycaremodelocmdata