Cargando…
Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes
INTRODUCTION: Population-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been ana...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Publishing Group
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7684662/ https://www.ncbi.nlm.nih.gov/pubmed/33229378 http://dx.doi.org/10.1136/bmjdrc-2020-001725 |
_version_ | 1783613044928020480 |
---|---|
author | Knight, Gabriel M Spencer-Bonilla, Gabriela Maahs, David M Blum, Manuel R Valencia, Areli Zuma, Bongeka Z Prahalad, Priya Sarraju, Ashish Rodriguez, Fatima Scheinker, David |
author_facet | Knight, Gabriel M Spencer-Bonilla, Gabriela Maahs, David M Blum, Manuel R Valencia, Areli Zuma, Bongeka Z Prahalad, Priya Sarraju, Ashish Rodriguez, Fatima Scheinker, David |
author_sort | Knight, Gabriel M |
collection | PubMed |
description | INTRODUCTION: Population-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM. RESEARCH DESIGN AND METHODS: County-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R(2)). RESULTS: Among the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R(2) of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data). CONCLUSIONS: Hispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study. |
format | Online Article Text |
id | pubmed-7684662 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BMJ Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-76846622020-11-30 Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes Knight, Gabriel M Spencer-Bonilla, Gabriela Maahs, David M Blum, Manuel R Valencia, Areli Zuma, Bongeka Z Prahalad, Priya Sarraju, Ashish Rodriguez, Fatima Scheinker, David BMJ Open Diabetes Res Care Epidemiology/Health services research INTRODUCTION: Population-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM. RESEARCH DESIGN AND METHODS: County-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R(2)). RESULTS: Among the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R(2) of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data). CONCLUSIONS: Hispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study. BMJ Publishing Group 2020-11-23 /pmc/articles/PMC7684662/ /pubmed/33229378 http://dx.doi.org/10.1136/bmjdrc-2020-001725 Text en © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ http://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Epidemiology/Health services research Knight, Gabriel M Spencer-Bonilla, Gabriela Maahs, David M Blum, Manuel R Valencia, Areli Zuma, Bongeka Z Prahalad, Priya Sarraju, Ashish Rodriguez, Fatima Scheinker, David Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes |
title | Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes |
title_full | Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes |
title_fullStr | Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes |
title_full_unstemmed | Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes |
title_short | Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes |
title_sort | multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, hispanic ethnicity and diabetes |
topic | Epidemiology/Health services research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7684662/ https://www.ncbi.nlm.nih.gov/pubmed/33229378 http://dx.doi.org/10.1136/bmjdrc-2020-001725 |
work_keys_str_mv | AT knightgabrielm multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT spencerbonillagabriela multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT maahsdavidm multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT blummanuelr multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT valenciaareli multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT zumabongekaz multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT prahaladpriya multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT sarrajuashish multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT rodriguezfatima multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes AT scheinkerdavid multimethodmultidatasetanalysisrevealsparadoxicalrelationshipsbetweensociodemographicfactorshispanicethnicityanddiabetes |