Cargando…
A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data
BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focu...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7894856/ https://www.ncbi.nlm.nih.gov/pubmed/33606819 http://dx.doi.org/10.1371/journal.pone.0247203 |
_version_ | 1783653312728399872 |
---|---|
author | Parikh, Ravi B. Linn, Kristin A. Yan, Jiali Maciejewski, Matthew L. Rosland, Ann-Marie Volpp, Kevin G. Groeneveld, Peter W. Navathe, Amol S. |
author_facet | Parikh, Ravi B. Linn, Kristin A. Yan, Jiali Maciejewski, Matthew L. Rosland, Ann-Marie Volpp, Kevin G. Groeneveld, Peter W. Navathe, Amol S. |
author_sort | Parikh, Ravi B. |
collection | PubMed |
description | BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focused on clinical comorbidities and not socioeconomic or behavioral factors. We used machine learning clustering methods and linked comorbidity-based, sociodemographic, and psychobehavioral data to identify subgroups of high-risk Veterans and study long-term outcomes, hypothesizing that factors other than comorbidities would characterize several subgroups. METHODS AND FINDINGS: In this cross-sectional study, we used data from the VA Corporate Data Warehouse, a national repository of VA administrative claims and electronic health data. To identify high-risk Veterans, we used the Care Assessment Needs (CAN) score, a routinely-used VA model that predicts a patient’s percentile risk of hospitalization or death at one year. Our study population consisted of 110,000 Veterans who were randomly sampled from 1,920,436 Veterans with a CAN score≥75(th) percentile in 2014. We categorized patient-level data into 119 independent variables based on demographics, comorbidities, pharmacy, vital signs, laboratories, and prior utilization. We used a previously validated density-based clustering algorithm to identify 30 subgroups of high-risk Veterans ranging in size from 50 to 2,446 patients. Mean CAN score ranged from 72.4 to 90.3 among subgroups. Two-year mortality ranged from 0.9% to 45.6% and was highest in the home-based care and metastatic cancer subgroups. Mean inpatient days ranged from 1.4 to 30.5 and were highest in the post-surgery and blood loss anemia subgroups. Mean emergency room visits ranged from 1.0 to 4.3 and were highest in the chronic sedative use and polysubstance use with amphetamine predominance subgroups. Five subgroups were distinguished by psychobehavioral factors and four subgroups were distinguished by sociodemographic factors. CONCLUSIONS: High-risk Veterans are a heterogeneous population consisting of multiple distinct subgroups–many of which are not defined by clinical comorbidities–with distinct utilization and outcome patterns. To our knowledge, this represents the largest application of ML clustering methods to subgroup a high-risk population. Further study is needed to determine whether distinct subgroups may benefit from individualized interventions. |
format | Online Article Text |
id | pubmed-7894856 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-78948562021-03-01 A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data Parikh, Ravi B. Linn, Kristin A. Yan, Jiali Maciejewski, Matthew L. Rosland, Ann-Marie Volpp, Kevin G. Groeneveld, Peter W. Navathe, Amol S. PLoS One Research Article BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focused on clinical comorbidities and not socioeconomic or behavioral factors. We used machine learning clustering methods and linked comorbidity-based, sociodemographic, and psychobehavioral data to identify subgroups of high-risk Veterans and study long-term outcomes, hypothesizing that factors other than comorbidities would characterize several subgroups. METHODS AND FINDINGS: In this cross-sectional study, we used data from the VA Corporate Data Warehouse, a national repository of VA administrative claims and electronic health data. To identify high-risk Veterans, we used the Care Assessment Needs (CAN) score, a routinely-used VA model that predicts a patient’s percentile risk of hospitalization or death at one year. Our study population consisted of 110,000 Veterans who were randomly sampled from 1,920,436 Veterans with a CAN score≥75(th) percentile in 2014. We categorized patient-level data into 119 independent variables based on demographics, comorbidities, pharmacy, vital signs, laboratories, and prior utilization. We used a previously validated density-based clustering algorithm to identify 30 subgroups of high-risk Veterans ranging in size from 50 to 2,446 patients. Mean CAN score ranged from 72.4 to 90.3 among subgroups. Two-year mortality ranged from 0.9% to 45.6% and was highest in the home-based care and metastatic cancer subgroups. Mean inpatient days ranged from 1.4 to 30.5 and were highest in the post-surgery and blood loss anemia subgroups. Mean emergency room visits ranged from 1.0 to 4.3 and were highest in the chronic sedative use and polysubstance use with amphetamine predominance subgroups. Five subgroups were distinguished by psychobehavioral factors and four subgroups were distinguished by sociodemographic factors. CONCLUSIONS: High-risk Veterans are a heterogeneous population consisting of multiple distinct subgroups–many of which are not defined by clinical comorbidities–with distinct utilization and outcome patterns. To our knowledge, this represents the largest application of ML clustering methods to subgroup a high-risk population. Further study is needed to determine whether distinct subgroups may benefit from individualized interventions. Public Library of Science 2021-02-19 /pmc/articles/PMC7894856/ /pubmed/33606819 http://dx.doi.org/10.1371/journal.pone.0247203 Text en © 2021 Parikh et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Parikh, Ravi B. Linn, Kristin A. Yan, Jiali Maciejewski, Matthew L. Rosland, Ann-Marie Volpp, Kevin G. Groeneveld, Peter W. Navathe, Amol S. A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
title | A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
title_full | A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
title_fullStr | A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
title_full_unstemmed | A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
title_short | A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
title_sort | machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7894856/ https://www.ncbi.nlm.nih.gov/pubmed/33606819 http://dx.doi.org/10.1371/journal.pone.0247203 |
work_keys_str_mv | AT parikhravib amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT linnkristina amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT yanjiali amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT maciejewskimatthewl amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT roslandannmarie amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT volppkeving amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT groeneveldpeterw amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT navatheamols amachinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT parikhravib machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT linnkristina machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT yanjiali machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT maciejewskimatthewl machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT roslandannmarie machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT volppkeving machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT groeneveldpeterw machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata AT navatheamols machinelearningapproachtoidentifydistinctsubgroupsofveteransatriskforhospitalizationordeathusingadministrativeandelectronichealthrecorddata |