Cargando…

Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties

BACKGROUND: Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected...

Descripción completa

Detalles Bibliográficos
Autores principales: Ricket, Iben M., Matheny, Michael E., MacKenzie, Todd A., Emond, Jennifer A., Ailawadi, Kusum L., Brown, Jeremiah R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10358365/
https://www.ncbi.nlm.nih.gov/pubmed/37476591
http://dx.doi.org/10.1016/j.ibmed.2023.100093
_version_ 1785075646134747136
author Ricket, Iben M.
Matheny, Michael E.
MacKenzie, Todd A.
Emond, Jennifer A.
Ailawadi, Kusum L.
Brown, Jeremiah R.
author_facet Ricket, Iben M.
Matheny, Michael E.
MacKenzie, Todd A.
Emond, Jennifer A.
Ailawadi, Kusum L.
Brown, Jeremiah R.
author_sort Ricket, Iben M.
collection PubMed
description BACKGROUND: Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals. METHODS: Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics. RESULTS: 2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R(2)), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics. CONCLUSIONS: Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units.
format Online
Article
Text
id pubmed-10358365
institution National Center for Biotechnology Information
language English
publishDate 2023
record_format MEDLINE/PubMed
spelling pubmed-103583652023-07-20 Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties Ricket, Iben M. Matheny, Michael E. MacKenzie, Todd A. Emond, Jennifer A. Ailawadi, Kusum L. Brown, Jeremiah R. Intell Based Med Article BACKGROUND: Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals. METHODS: Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics. RESULTS: 2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R(2)), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics. CONCLUSIONS: Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units. 2023 2023-01-21 /pmc/articles/PMC10358365/ /pubmed/37476591 http://dx.doi.org/10.1016/j.ibmed.2023.100093 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ).
spellingShingle Article
Ricket, Iben M.
Matheny, Michael E.
MacKenzie, Todd A.
Emond, Jennifer A.
Ailawadi, Kusum L.
Brown, Jeremiah R.
Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties
title Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties
title_full Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties
title_fullStr Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties
title_full_unstemmed Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties
title_short Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties
title_sort novel integration of governmental data sources using machine learning to identify super-utilization among u.s. counties
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10358365/
https://www.ncbi.nlm.nih.gov/pubmed/37476591
http://dx.doi.org/10.1016/j.ibmed.2023.100093
work_keys_str_mv AT ricketibenm novelintegrationofgovernmentaldatasourcesusingmachinelearningtoidentifysuperutilizationamonguscounties
AT mathenymichaele novelintegrationofgovernmentaldatasourcesusingmachinelearningtoidentifysuperutilizationamonguscounties
AT mackenzietodda novelintegrationofgovernmentaldatasourcesusingmachinelearningtoidentifysuperutilizationamonguscounties
AT emondjennifera novelintegrationofgovernmentaldatasourcesusingmachinelearningtoidentifysuperutilizationamonguscounties
AT ailawadikusuml novelintegrationofgovernmentaldatasourcesusingmachinelearningtoidentifysuperutilizationamonguscounties
AT brownjeremiahr novelintegrationofgovernmentaldatasourcesusingmachinelearningtoidentifysuperutilizationamonguscounties