Cargando…

Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models

IMPORTANCE: Obesity is a leading cause of high health care expenditures, disability, and premature mortality. Previous studies have documented geographic disparities in obesity prevalence. OBJECTIVE: To identify county-level factors associated with obesity using traditional epidemiologic and machine...

Descripción completa

Detalles Bibliográficos
Autores principales: Scheinker, David, Valencia, Areli, Rodriguez, Fatima
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6487629/
https://www.ncbi.nlm.nih.gov/pubmed/31026030
http://dx.doi.org/10.1001/jamanetworkopen.2019.2884
_version_ 1783414531324641280
author Scheinker, David
Valencia, Areli
Rodriguez, Fatima
author_facet Scheinker, David
Valencia, Areli
Rodriguez, Fatima
author_sort Scheinker, David
collection PubMed
description IMPORTANCE: Obesity is a leading cause of high health care expenditures, disability, and premature mortality. Previous studies have documented geographic disparities in obesity prevalence. OBJECTIVE: To identify county-level factors associated with obesity using traditional epidemiologic and machine learning methods. DESIGN, SETTING, AND PARTICIPANTS: Cross-sectional study using linear regression models and machine learning models to evaluate the associations between county-level obesity and county-level demographic, socioeconomic, health care, and environmental factors from summarized statistical data extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data from each of 3138 US counties. The explanatory power of the linear multivariate regression and the top performing machine learning model were compared using mean R(2) measured in 30-fold cross validation. EXPOSURES: County-level demographic factors (population; rural status; census region; and race/ethnicity, sex, and age composition), socioeconomic factors (median income, unemployment rate, and percentage of population with some college education), health care factors (rate of uninsured adults and primary care physicians), and environmental factors (access to healthy foods and access to exercise opportunities). MAIN OUTCOMES AND MEASURES: County-level obesity prevalence in 2018, its association with each county-level factor, and the percentage of variation in county-level obesity prevalence explained by linear multivariate and gradient boosting machine regression measured with R(2). RESULTS: Among the 3138 counties studied, the mean (range) obesity prevalence was 31.5% (12.8%-47.8%). In multivariate regressions, demographic factors explained 44.9% of variation in obesity prevalence; socioeconomic factors, 33.0%; environmental factors, 15.5%; and health care factors, 9.1%. The county-level factors with the strongest association with obesity were census region, median household income, and percentage of population with some college education. R(2) values of univariate regressions of obesity prevalence were 0.238 for census region, 0.218 for median household income, and 0.160 for percentage of population with some college education. Multivariate linear regression and gradient boosting machine regression (the best-performing machine learning model) of obesity prevalence using all county-level demographic, socioeconomic, health care, and environmental factors had R(2) values of 0.58 and 0.66, respectively (P < .001). CONCLUSIONS AND RELEVANCE: Obesity prevalence varies significantly between counties. County-level demographic, socioeconomic, health care, and environmental factors explain the majority of variation in county-level obesity prevalence. Using machine learning models may explain significantly more of the variation in obesity prevalence..
format Online
Article
Text
id pubmed-6487629
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-64876292019-05-03 Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models Scheinker, David Valencia, Areli Rodriguez, Fatima JAMA Netw Open Original Investigation IMPORTANCE: Obesity is a leading cause of high health care expenditures, disability, and premature mortality. Previous studies have documented geographic disparities in obesity prevalence. OBJECTIVE: To identify county-level factors associated with obesity using traditional epidemiologic and machine learning methods. DESIGN, SETTING, AND PARTICIPANTS: Cross-sectional study using linear regression models and machine learning models to evaluate the associations between county-level obesity and county-level demographic, socioeconomic, health care, and environmental factors from summarized statistical data extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data from each of 3138 US counties. The explanatory power of the linear multivariate regression and the top performing machine learning model were compared using mean R(2) measured in 30-fold cross validation. EXPOSURES: County-level demographic factors (population; rural status; census region; and race/ethnicity, sex, and age composition), socioeconomic factors (median income, unemployment rate, and percentage of population with some college education), health care factors (rate of uninsured adults and primary care physicians), and environmental factors (access to healthy foods and access to exercise opportunities). MAIN OUTCOMES AND MEASURES: County-level obesity prevalence in 2018, its association with each county-level factor, and the percentage of variation in county-level obesity prevalence explained by linear multivariate and gradient boosting machine regression measured with R(2). RESULTS: Among the 3138 counties studied, the mean (range) obesity prevalence was 31.5% (12.8%-47.8%). In multivariate regressions, demographic factors explained 44.9% of variation in obesity prevalence; socioeconomic factors, 33.0%; environmental factors, 15.5%; and health care factors, 9.1%. The county-level factors with the strongest association with obesity were census region, median household income, and percentage of population with some college education. R(2) values of univariate regressions of obesity prevalence were 0.238 for census region, 0.218 for median household income, and 0.160 for percentage of population with some college education. Multivariate linear regression and gradient boosting machine regression (the best-performing machine learning model) of obesity prevalence using all county-level demographic, socioeconomic, health care, and environmental factors had R(2) values of 0.58 and 0.66, respectively (P < .001). CONCLUSIONS AND RELEVANCE: Obesity prevalence varies significantly between counties. County-level demographic, socioeconomic, health care, and environmental factors explain the majority of variation in county-level obesity prevalence. Using machine learning models may explain significantly more of the variation in obesity prevalence.. American Medical Association 2019-04-26 /pmc/articles/PMC6487629/ /pubmed/31026030 http://dx.doi.org/10.1001/jamanetworkopen.2019.2884 Text en Copyright 2019 Scheinker D et al. JAMA Network Open. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Scheinker, David
Valencia, Areli
Rodriguez, Fatima
Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models
title Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models
title_full Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models
title_fullStr Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models
title_full_unstemmed Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models
title_short Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models
title_sort identification of factors associated with variation in us county-level obesity prevalence rates using epidemiologic vs machine learning models
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6487629/
https://www.ncbi.nlm.nih.gov/pubmed/31026030
http://dx.doi.org/10.1001/jamanetworkopen.2019.2884
work_keys_str_mv AT scheinkerdavid identificationoffactorsassociatedwithvariationinuscountylevelobesityprevalenceratesusingepidemiologicvsmachinelearningmodels
AT valenciaareli identificationoffactorsassociatedwithvariationinuscountylevelobesityprevalenceratesusingepidemiologicvsmachinelearningmodels
AT rodriguezfatima identificationoffactorsassociatedwithvariationinuscountylevelobesityprevalenceratesusingepidemiologicvsmachinelearningmodels