Cargando…
Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses
BACKGROUND: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. METHODS: This study identified SDoH variables associated with...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9008430/ https://www.ncbi.nlm.nih.gov/pubmed/35421958 http://dx.doi.org/10.1186/s12889-022-13168-y |
_version_ | 1784687052970786816 |
---|---|
author | Madlock-Brown, Charisse Wilkens, Ken Weiskopf, Nicole Cesare, Nina Bhattacharyya, Sharmodeep Riches, Naomi O. Espinoza, Juan Dorr, David Goetz, Kerry Phuong, Jimmy Sule, Anupam Kharrazi, Hadi Liu, Feifan Lemon, Cindy Adams, William G. |
author_facet | Madlock-Brown, Charisse Wilkens, Ken Weiskopf, Nicole Cesare, Nina Bhattacharyya, Sharmodeep Riches, Naomi O. Espinoza, Juan Dorr, David Goetz, Kerry Phuong, Jimmy Sule, Anupam Kharrazi, Hadi Liu, Feifan Lemon, Cindy Adams, William G. |
author_sort | Madlock-Brown, Charisse |
collection | PubMed |
description | BACKGROUND: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. METHODS: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. RESULTS: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. CONCLUSION: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-022-13168-y. |
format | Online Article Text |
id | pubmed-9008430 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-90084302022-04-14 Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses Madlock-Brown, Charisse Wilkens, Ken Weiskopf, Nicole Cesare, Nina Bhattacharyya, Sharmodeep Riches, Naomi O. Espinoza, Juan Dorr, David Goetz, Kerry Phuong, Jimmy Sule, Anupam Kharrazi, Hadi Liu, Feifan Lemon, Cindy Adams, William G. BMC Public Health Research BACKGROUND: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. METHODS: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. RESULTS: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. CONCLUSION: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-022-13168-y. BioMed Central 2022-04-14 /pmc/articles/PMC9008430/ /pubmed/35421958 http://dx.doi.org/10.1186/s12889-022-13168-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Madlock-Brown, Charisse Wilkens, Ken Weiskopf, Nicole Cesare, Nina Bhattacharyya, Sharmodeep Riches, Naomi O. Espinoza, Juan Dorr, David Goetz, Kerry Phuong, Jimmy Sule, Anupam Kharrazi, Hadi Liu, Feifan Lemon, Cindy Adams, William G. Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
title | Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
title_full | Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
title_fullStr | Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
title_full_unstemmed | Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
title_short | Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
title_sort | clinical, social, and policy factors in covid-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9008430/ https://www.ncbi.nlm.nih.gov/pubmed/35421958 http://dx.doi.org/10.1186/s12889-022-13168-y |
work_keys_str_mv | AT madlockbrowncharisse clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT wilkensken clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT weiskopfnicole clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT cesarenina clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT bhattacharyyasharmodeep clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT richesnaomio clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT espinozajuan clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT dorrdavid clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT goetzkerry clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT phuongjimmy clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT suleanupam clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT kharrazihadi clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT liufeifan clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT lemoncindy clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses AT adamswilliamg clinicalsocialandpolicyfactorsincovid19casesanddeathsmethodologicalconsiderationsforfeatureselectionandmodelingincountylevelanalyses |