Cargando…

Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity

BACKGROUND: Studies suggest that where people live, play, and work can influence health and well-being. However, the dearth of neighborhood data, especially data that is timely and consistent across geographies, hinders understanding of the effects of neighborhoods on health. Social media data repre...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Quynh C, Li, Dapeng, Meng, Hsien-Wen, Kath, Suraj, Nsoesie, Elaine, Li, Feifei, Wen, Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088343/
https://www.ncbi.nlm.nih.gov/pubmed/27751984
http://dx.doi.org/10.2196/publichealth.5869
_version_ 1782464075962253312
author Nguyen, Quynh C
Li, Dapeng
Meng, Hsien-Wen
Kath, Suraj
Nsoesie, Elaine
Li, Feifei
Wen, Ming
author_facet Nguyen, Quynh C
Li, Dapeng
Meng, Hsien-Wen
Kath, Suraj
Nsoesie, Elaine
Li, Feifei
Wen, Ming
author_sort Nguyen, Quynh C
collection PubMed
description BACKGROUND: Studies suggest that where people live, play, and work can influence health and well-being. However, the dearth of neighborhood data, especially data that is timely and consistent across geographies, hinders understanding of the effects of neighborhoods on health. Social media data represents a possible new data resource for neighborhood research. OBJECTIVE: The aim of this study was to build, from geotagged Twitter data, a national neighborhood database with area-level indicators of well-being and health behaviors. METHODS: We utilized Twitter’s streaming application programming interface to continuously collect a random 1% subset of publicly available geolocated tweets for 1 year (April 2015 to March 2016). We collected 80 million geotagged tweets from 603,363 unique Twitter users across the contiguous United States. We validated our machine learning algorithms for constructing indicators of happiness, food, and physical activity by comparing predicted values to those generated by human labelers. Geotagged tweets were spatially mapped to the 2010 census tract and zip code areas they fall within, which enabled further assessment of the associations between Twitter-derived neighborhood variables and neighborhood demographic, economic, business, and health characteristics. RESULTS: Machine labeled and manually labeled tweets had a high level of accuracy: 78% for happiness, 83% for food, and 85% for physical activity for dichotomized labels with the F scores 0.54, 0.86, and 0.90, respectively. About 20% of tweets were classified as happy. Relatively few terms (less than 25) were necessary to characterize the majority of tweets on food and physical activity. Data from over 70,000 census tracts from the United States suggest that census tract factors like percentage African American and economic disadvantage were associated with lower census tract happiness. Urbanicity was related to higher frequency of fast food tweets. Greater numbers of fast food restaurants predicted higher frequency of fast food mentions. Surprisingly, fitness centers and nature parks were only modestly associated with higher frequency of physical activity tweets. Greater state-level happiness, positivity toward physical activity, and positivity toward healthy foods, assessed via tweets, were associated with lower all-cause mortality and prevalence of chronic conditions such as obesity and diabetes and lower physical inactivity and smoking, controlling for state median income, median age, and percentage white non-Hispanic. CONCLUSIONS: Machine learning algorithms can be built with relatively high accuracy to characterize sentiment, food, and physical activity mentions on social media. Such data can be utilized to construct neighborhood indicators consistently and cost effectively. Access to neighborhood data, in turn, can be leveraged to better understand neighborhood effects and address social determinants of health. We found that neighborhoods with social and economic disadvantage, high urbanicity, and more fast food restaurants may exhibit lower happiness and fewer healthy behaviors.
format Online
Article
Text
id pubmed-5088343
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-50883432016-11-17 Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity Nguyen, Quynh C Li, Dapeng Meng, Hsien-Wen Kath, Suraj Nsoesie, Elaine Li, Feifei Wen, Ming JMIR Public Health Surveill Original Paper BACKGROUND: Studies suggest that where people live, play, and work can influence health and well-being. However, the dearth of neighborhood data, especially data that is timely and consistent across geographies, hinders understanding of the effects of neighborhoods on health. Social media data represents a possible new data resource for neighborhood research. OBJECTIVE: The aim of this study was to build, from geotagged Twitter data, a national neighborhood database with area-level indicators of well-being and health behaviors. METHODS: We utilized Twitter’s streaming application programming interface to continuously collect a random 1% subset of publicly available geolocated tweets for 1 year (April 2015 to March 2016). We collected 80 million geotagged tweets from 603,363 unique Twitter users across the contiguous United States. We validated our machine learning algorithms for constructing indicators of happiness, food, and physical activity by comparing predicted values to those generated by human labelers. Geotagged tweets were spatially mapped to the 2010 census tract and zip code areas they fall within, which enabled further assessment of the associations between Twitter-derived neighborhood variables and neighborhood demographic, economic, business, and health characteristics. RESULTS: Machine labeled and manually labeled tweets had a high level of accuracy: 78% for happiness, 83% for food, and 85% for physical activity for dichotomized labels with the F scores 0.54, 0.86, and 0.90, respectively. About 20% of tweets were classified as happy. Relatively few terms (less than 25) were necessary to characterize the majority of tweets on food and physical activity. Data from over 70,000 census tracts from the United States suggest that census tract factors like percentage African American and economic disadvantage were associated with lower census tract happiness. Urbanicity was related to higher frequency of fast food tweets. Greater numbers of fast food restaurants predicted higher frequency of fast food mentions. Surprisingly, fitness centers and nature parks were only modestly associated with higher frequency of physical activity tweets. Greater state-level happiness, positivity toward physical activity, and positivity toward healthy foods, assessed via tweets, were associated with lower all-cause mortality and prevalence of chronic conditions such as obesity and diabetes and lower physical inactivity and smoking, controlling for state median income, median age, and percentage white non-Hispanic. CONCLUSIONS: Machine learning algorithms can be built with relatively high accuracy to characterize sentiment, food, and physical activity mentions on social media. Such data can be utilized to construct neighborhood indicators consistently and cost effectively. Access to neighborhood data, in turn, can be leveraged to better understand neighborhood effects and address social determinants of health. We found that neighborhoods with social and economic disadvantage, high urbanicity, and more fast food restaurants may exhibit lower happiness and fewer healthy behaviors. JMIR Publications 2016-10-17 /pmc/articles/PMC5088343/ /pubmed/27751984 http://dx.doi.org/10.2196/publichealth.5869 Text en ©Quynh C Nguyen, Dapeng Li, Hsien-Wen Meng, Suraj Kath, Elaine Nsoesie, Feifei Li, Ming Wen. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 17.10.2016. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Nguyen, Quynh C
Li, Dapeng
Meng, Hsien-Wen
Kath, Suraj
Nsoesie, Elaine
Li, Feifei
Wen, Ming
Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
title Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
title_full Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
title_fullStr Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
title_full_unstemmed Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
title_short Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
title_sort building a national neighborhood dataset from geotagged twitter data for indicators of happiness, diet, and physical activity
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088343/
https://www.ncbi.nlm.nih.gov/pubmed/27751984
http://dx.doi.org/10.2196/publichealth.5869
work_keys_str_mv AT nguyenquynhc buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity
AT lidapeng buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity
AT menghsienwen buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity
AT kathsuraj buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity
AT nsoesieelaine buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity
AT lifeifei buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity
AT wenming buildinganationalneighborhooddatasetfromgeotaggedtwitterdataforindicatorsofhappinessdietandphysicalactivity