Cargando…

Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data

BACKGROUND: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent hi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramachandran, Raghav, McShea, Michael J, Howson, Stephanie N, Burkom, Howard S, Chang, Hsien-Yen, Weiner, Jonathan P, Kharrazi, Hadi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663459/
https://www.ncbi.nlm.nih.gov/pubmed/34592712
http://dx.doi.org/10.2196/31442
_version_ 1784613641720430592
author Ramachandran, Raghav
McShea, Michael J
Howson, Stephanie N
Burkom, Howard S
Chang, Hsien-Yen
Weiner, Jonathan P
Kharrazi, Hadi
author_facet Ramachandran, Raghav
McShea, Michael J
Howson, Stephanie N
Burkom, Howard S
Chang, Hsien-Yen
Weiner, Jonathan P
Kharrazi, Hadi
author_sort Ramachandran, Raghav
collection PubMed
description BACKGROUND: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve health care system efficiencies and improve the overall quality of patient care. OBJECTIVE: The aim of this study was to detect key classes of diseases and medications among the study population and to assess the predictive value of these classes in identifying PHUs. METHODS: This study was a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring health care costs in the top 20% of all patients’ costs for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied latent class analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in health care utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic covariates, and health utilization covariates. Predictive powers of the regression models were assessed and compared using standard metrics. RESULTS: We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were women, 3.3% had ≥1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.09%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: acute upper respiratory infection (URI) (n=53,232; 4.6% PHUs), mental health (n=34,456; 12.8% PHUs), otitis media (n=24,992; 4.5% PHUs), and musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model that included LCA categories (F1=38.62%) compared to that of a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the mental health and musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than that of the complex model when the LCA-enabled models were limited to the otitis media and acute URI subpopulations (45.77% and 43.05%, respectively). CONCLUSIONS: Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other health care settings.
format Online
Article
Text
id pubmed-8663459
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-86634592022-01-05 Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data Ramachandran, Raghav McShea, Michael J Howson, Stephanie N Burkom, Howard S Chang, Hsien-Yen Weiner, Jonathan P Kharrazi, Hadi JMIR Med Inform Original Paper BACKGROUND: A high proportion of health care services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing costs and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve health care system efficiencies and improve the overall quality of patient care. OBJECTIVE: The aim of this study was to detect key classes of diseases and medications among the study population and to assess the predictive value of these classes in identifying PHUs. METHODS: This study was a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring health care costs in the top 20% of all patients’ costs for 4 consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied latent class analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in health care utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic covariates, and health utilization covariates. Predictive powers of the regression models were assessed and compared using standard metrics. RESULTS: We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were women, 3.3% had ≥1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.09%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: acute upper respiratory infection (URI) (n=53,232; 4.6% PHUs), mental health (n=34,456; 12.8% PHUs), otitis media (n=24,992; 4.5% PHUs), and musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model that included LCA categories (F1=38.62%) compared to that of a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the mental health and musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than that of the complex model when the LCA-enabled models were limited to the otitis media and acute URI subpopulations (45.77% and 43.05%, respectively). CONCLUSIONS: Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other health care settings. JMIR Publications 2021-11-25 /pmc/articles/PMC8663459/ /pubmed/34592712 http://dx.doi.org/10.2196/31442 Text en ©Raghav Ramachandran, Michael J McShea, Stephanie N Howson, Howard S Burkom, Hsien-Yen Chang, Jonathan P Weiner, Hadi Kharrazi. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.11.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Ramachandran, Raghav
McShea, Michael J
Howson, Stephanie N
Burkom, Howard S
Chang, Hsien-Yen
Weiner, Jonathan P
Kharrazi, Hadi
Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data
title Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data
title_full Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data
title_fullStr Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data
title_full_unstemmed Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data
title_short Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data
title_sort assessing the value of unsupervised clustering in predicting persistent high health care utilizers: retrospective analysis of insurance claims data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663459/
https://www.ncbi.nlm.nih.gov/pubmed/34592712
http://dx.doi.org/10.2196/31442
work_keys_str_mv AT ramachandranraghav assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata
AT mcsheamichaelj assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata
AT howsonstephanien assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata
AT burkomhowards assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata
AT changhsienyen assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata
AT weinerjonathanp assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata
AT kharrazihadi assessingthevalueofunsupervisedclusteringinpredictingpersistenthighhealthcareutilizersretrospectiveanalysisofinsuranceclaimsdata