Cargando…

Ensemble machine learning of factors influencing COVID-19 across US counties

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investiga...

Descripción completa

Detalles Bibliográficos
Autores principales: McCoy, David, Mgbara, Whitney, Horvitz, Nir, Getz, Wayne M., Hubbard, Alan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8175420/
https://www.ncbi.nlm.nih.gov/pubmed/34083563
http://dx.doi.org/10.1038/s41598-021-90827-x
_version_ 1783703053821542400
author McCoy, David
Mgbara, Whitney
Horvitz, Nir
Getz, Wayne M.
Hubbard, Alan
author_facet McCoy, David
Mgbara, Whitney
Horvitz, Nir
Getz, Wayne M.
Hubbard, Alan
author_sort McCoy, David
collection PubMed
description Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities.
format Online
Article
Text
id pubmed-8175420
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-81754202021-06-04 Ensemble machine learning of factors influencing COVID-19 across US counties McCoy, David Mgbara, Whitney Horvitz, Nir Getz, Wayne M. Hubbard, Alan Sci Rep Article Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities. Nature Publishing Group UK 2021-06-03 /pmc/articles/PMC8175420/ /pubmed/34083563 http://dx.doi.org/10.1038/s41598-021-90827-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
McCoy, David
Mgbara, Whitney
Horvitz, Nir
Getz, Wayne M.
Hubbard, Alan
Ensemble machine learning of factors influencing COVID-19 across US counties
title Ensemble machine learning of factors influencing COVID-19 across US counties
title_full Ensemble machine learning of factors influencing COVID-19 across US counties
title_fullStr Ensemble machine learning of factors influencing COVID-19 across US counties
title_full_unstemmed Ensemble machine learning of factors influencing COVID-19 across US counties
title_short Ensemble machine learning of factors influencing COVID-19 across US counties
title_sort ensemble machine learning of factors influencing covid-19 across us counties
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8175420/
https://www.ncbi.nlm.nih.gov/pubmed/34083563
http://dx.doi.org/10.1038/s41598-021-90827-x
work_keys_str_mv AT mccoydavid ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT mgbarawhitney ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT horvitznir ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT getzwaynem ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties
AT hubbardalan ensemblemachinelearningoffactorsinfluencingcovid19acrossuscounties