Cargando…

A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data

BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focu...

Descripción completa

Detalles Bibliográficos
Autores principales: Parikh, Ravi B., Linn, Kristin A., Yan, Jiali, Maciejewski, Matthew L., Rosland, Ann-Marie, Volpp, Kevin G., Groeneveld, Peter W., Navathe, Amol S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7894856/
https://www.ncbi.nlm.nih.gov/pubmed/33606819
http://dx.doi.org/10.1371/journal.pone.0247203
_version_ 1783653312728399872
author Parikh, Ravi B.
Linn, Kristin A.
Yan, Jiali
Maciejewski, Matthew L.
Rosland, Ann-Marie
Volpp, Kevin G.
Groeneveld, Peter W.
Navathe, Amol S.
author_facet Parikh, Ravi B.
Linn, Kristin A.
Yan, Jiali
Maciejewski, Matthew L.
Rosland, Ann-Marie
Volpp, Kevin G.
Groeneveld, Peter W.
Navathe, Amol S.
author_sort Parikh, Ravi B.
collection PubMed
description BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focused on clinical comorbidities and not socioeconomic or behavioral factors. We used machine learning clustering methods and linked comorbidity-based, sociodemographic, and psychobehavioral data to identify subgroups of high-risk Veterans and study long-term outcomes, hypothesizing that factors other than comorbidities would characterize several subgroups. METHODS AND FINDINGS: In this cross-sectional study, we used data from the VA Corporate Data Warehouse, a national repository of VA administrative claims and electronic health data. To identify high-risk Veterans, we used the Care Assessment Needs (CAN) score, a routinely-used VA model that predicts a patient’s percentile risk of hospitalization or death at one year. Our study population consisted of 110,000 Veterans who were randomly sampled from 1,920,436 Veterans with a CAN score≥75(th) percentile in 2014. We categorized patient-level data into 119 independent variables based on demographics, comorbidities, pharmacy, vital signs, laboratories, and prior utilization. We used a previously validated density-based clustering algorithm to identify 30 subgroups of high-risk Veterans ranging in size from 50 to 2,446 patients. Mean CAN score ranged from 72.4 to 90.3 among subgroups. Two-year mortality ranged from 0.9% to 45.6% and was highest in the home-based care and metastatic cancer subgroups. Mean inpatient days ranged from 1.4 to 30.5 and were highest in the post-surgery and blood loss anemia subgroups. Mean emergency room visits ranged from 1.0 to 4.3 and were highest in the chronic sedative use and polysubstance use with amphetamine predominance subgroups. Five subgroups were distinguished by psychobehavioral factors and four subgroups were distinguished by sociodemographic factors. CONCLUSIONS: High-risk Veterans are a heterogeneous population consisting of multiple distinct subgroups–many of which are not defined by clinical comorbidities–with distinct utilization and outcome patterns. To our knowledge, this represents the largest application of ML clustering methods to subgroup a high-risk population. Further study is needed to determine whether distinct subgroups may benefit from individualized interventions.
format Online
Article
Text
id pubmed-7894856
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-78948562021-03-01 A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data Parikh, Ravi B. Linn, Kristin A. Yan, Jiali Maciejewski, Matthew L. Rosland, Ann-Marie Volpp, Kevin G. Groeneveld, Peter W. Navathe, Amol S. PLoS One Research Article BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focused on clinical comorbidities and not socioeconomic or behavioral factors. We used machine learning clustering methods and linked comorbidity-based, sociodemographic, and psychobehavioral data to identify subgroups of high-risk Veterans and study long-term outcomes, hypothesizing that factors other than comorbidities would characterize several subgroups. METHODS AND FINDINGS: In this cross-sectional study, we used data from the VA Corporate Data Warehouse, a national repository of VA administrative claims and electronic health data. To identify high-risk Veterans, we used the Care Assessment Needs (CAN) score, a routinely-used VA model that predicts a patient’s percentile risk of hospitalization or death at one year. Our study population consisted of 110,000 Veterans who were randomly sampled from 1,920,436 Veterans with a CAN score≥75(th) percentile in 2014. We categorized patient-level data into 119 independent variables based on demographics, comorbidities, pharmacy, vital signs, laboratories, and prior utilization. We used a previously validated density-based clustering algorithm to identify 30 subgroups of high-risk Veterans ranging in size from 50 to 2,446 patients. Mean CAN score ranged from 72.4 to 90.3 among subgroups. Two-year mortality ranged from 0.9% to 45.6% and was highest in the home-based care and metastatic cancer subgroups. Mean inpatient days ranged from 1.4 to 30.5 and were highest in the post-surgery and blood loss anemia subgroups. Mean emergency room visits ranged from 1.0 to 4.3 and were highest in the chronic sedative use and polysubstance use with amphetamine predominance subgroups. Five subgroups were distinguished by psychobehavioral factors and four subgroups were distinguished by sociodemographic factors. CONCLUSIONS: High-risk Veterans are a heterogeneous population consisting of multiple distinct subgroups–many of which are not defined by clinical comorbidities–with distinct utilization and outcome patterns. To our knowledge, this represents the largest application of ML clustering methods to subgroup a high-risk population. Further study is needed to determine whether distinct subgroups may benefit from individualized interventions. Public Library of Science 2021-02-19 /pmc/articles/PMC7894856/ /pubmed/33606819 http://dx.doi.org/10.1371/journal.pone.0247203 Text en © 2021 Parikh et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Parikh, Ravi B.
Linn, Kristin A.
Yan, Jiali
Maciejewski, Matthew L.
Rosland, Ann-Marie
Volpp, Kevin G.
Groeneveld, Peter W.
Navathe, Amol S.
A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
title A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
title_full A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
title_fullStr A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
title_full_unstemmed A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
title_short A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
title_sort machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7894856/
https://www.ncbi.nlm.nih.gov/pubmed/33606819
http://dx.doi.org/10.1371/journal.pone.0247203
work_keys_str_mv AT parikhravib amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT linnkristina amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT yanjiali amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT maciejewskimatthewl amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT roslandannmarie amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT volppkeving amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT groeneveldpeterw amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT navatheamols amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT parikhravib machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT linnkristina machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT yanjiali machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT maciejewskimatthewl machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT roslandannmarie machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT volppkeving machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT groeneveldpeterw machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata
AT navatheamols machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata