Cargando…

Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes

INTRODUCTION: Population-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been ana...

Descripción completa

Detalles Bibliográficos
Autores principales: Knight, Gabriel M, Spencer-Bonilla, Gabriela, Maahs, David M, Blum, Manuel R, Valencia, Areli, Zuma, Bongeka Z, Prahalad, Priya, Sarraju, Ashish, Rodriguez, Fatima, Scheinker, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7684662/
https://www.ncbi.nlm.nih.gov/pubmed/33229378
http://dx.doi.org/10.1136/bmjdrc-2020-001725
_version_ 1783613044928020480
author Knight, Gabriel M
Spencer-Bonilla, Gabriela
Maahs, David M
Blum, Manuel R
Valencia, Areli
Zuma, Bongeka Z
Prahalad, Priya
Sarraju, Ashish
Rodriguez, Fatima
Scheinker, David
author_facet Knight, Gabriel M
Spencer-Bonilla, Gabriela
Maahs, David M
Blum, Manuel R
Valencia, Areli
Zuma, Bongeka Z
Prahalad, Priya
Sarraju, Ashish
Rodriguez, Fatima
Scheinker, David
author_sort Knight, Gabriel M
collection PubMed
description INTRODUCTION: Population-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM. RESEARCH DESIGN AND METHODS: County-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R(2)). RESULTS: Among the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R(2) of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data). CONCLUSIONS: Hispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study.
format Online
Article
Text
id pubmed-7684662
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-76846622020-11-30 Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes Knight, Gabriel M Spencer-Bonilla, Gabriela Maahs, David M Blum, Manuel R Valencia, Areli Zuma, Bongeka Z Prahalad, Priya Sarraju, Ashish Rodriguez, Fatima Scheinker, David BMJ Open Diabetes Res Care Epidemiology/Health services research INTRODUCTION: Population-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM. RESEARCH DESIGN AND METHODS: County-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R(2)). RESULTS: Among the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R(2) of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data). CONCLUSIONS: Hispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study. BMJ Publishing Group 2020-11-23 /pmc/articles/PMC7684662/ /pubmed/33229378 http://dx.doi.org/10.1136/bmjdrc-2020-001725 Text en © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ http://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Epidemiology/Health services research
Knight, Gabriel M
Spencer-Bonilla, Gabriela
Maahs, David M
Blum, Manuel R
Valencia, Areli
Zuma, Bongeka Z
Prahalad, Priya
Sarraju, Ashish
Rodriguez, Fatima
Scheinker, David
Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
title Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
title_full Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
title_fullStr Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
title_full_unstemmed Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
title_short Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
title_sort multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, hispanic ethnicity and diabetes
topic Epidemiology/Health services research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7684662/
https://www.ncbi.nlm.nih.gov/pubmed/33229378
http://dx.doi.org/10.1136/bmjdrc-2020-001725
work_keys_str_mv AT knightgabrielm multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT spencerbonillagabriela multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT maahsdavidm multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT blummanuelr multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT valenciaareli multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT zumabongekaz multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT prahaladpriya multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT sarrajuashish multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT rodriguezfatima multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes
AT scheinkerdavid multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes