Cargando…

Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning

BACKGROUND: The exponential growth of location-based social media (LBSM) data has ushered in novel prospects for investigating the urban food environment in health geography research. However, previous studies have primarily relied on word dictionaries with a limited number of food words and employe...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jue, Kim, Gyoorie, Chang, Kevin Chen-Chuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505329/
https://www.ncbi.nlm.nih.gov/pubmed/37716950
http://dx.doi.org/10.1186/s12942-023-00344-5
_version_ 1785106895762096128
author Wang, Jue
Kim, Gyoorie
Chang, Kevin Chen-Chuan
author_facet Wang, Jue
Kim, Gyoorie
Chang, Kevin Chen-Chuan
author_sort Wang, Jue
collection PubMed
description BACKGROUND: The exponential growth of location-based social media (LBSM) data has ushered in novel prospects for investigating the urban food environment in health geography research. However, previous studies have primarily relied on word dictionaries with a limited number of food words and employed common-sense categorizations to determine the healthiness of those words. To enhance the analysis of the urban food environment using LBSM data, it is crucial to develop a more comprehensive list of food-related words. Within the context, this study delves into the exploration of expanding food-related words along with their associated energy densities. METHODS: This study addresses the aforementioned research gap by introducing a novel methodology for expanding the food-related word dictionary and predicting energy densities. Seed words are generated from official and crowdsourced food composition databases, and new food words are discovered by clustering food words within the word embedding space using the Gaussian mixture model. Machine learning models are employed to predict the energy density classifications of these food words based on their feature vectors. To ensure a thorough exploration of the prediction problem, ten widely used machine learning models are evaluated. RESULTS: The approach successfully expands the food-related word dictionary and accurately predicts food energy density (reaching 91.62%.). Through a comparison of the newly expanded dictionary with the initial seed words and an analysis of Yelp reviews in the city of Toronto, we observe significant improvements in identifying food words and gaining a deeper understanding of the food environment. CONCLUSIONS: This study proposes a novel method to expand food-related vocabulary and predict the food energy density based on machine learning and word embedding. This method makes a valuable contribution to building a more comprehensive list of food words that can be used in geography and public health studies by mining geotagged social media data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12942-023-00344-5.
format Online
Article
Text
id pubmed-10505329
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105053292023-09-18 Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning Wang, Jue Kim, Gyoorie Chang, Kevin Chen-Chuan Int J Health Geogr Research BACKGROUND: The exponential growth of location-based social media (LBSM) data has ushered in novel prospects for investigating the urban food environment in health geography research. However, previous studies have primarily relied on word dictionaries with a limited number of food words and employed common-sense categorizations to determine the healthiness of those words. To enhance the analysis of the urban food environment using LBSM data, it is crucial to develop a more comprehensive list of food-related words. Within the context, this study delves into the exploration of expanding food-related words along with their associated energy densities. METHODS: This study addresses the aforementioned research gap by introducing a novel methodology for expanding the food-related word dictionary and predicting energy densities. Seed words are generated from official and crowdsourced food composition databases, and new food words are discovered by clustering food words within the word embedding space using the Gaussian mixture model. Machine learning models are employed to predict the energy density classifications of these food words based on their feature vectors. To ensure a thorough exploration of the prediction problem, ten widely used machine learning models are evaluated. RESULTS: The approach successfully expands the food-related word dictionary and accurately predicts food energy density (reaching 91.62%.). Through a comparison of the newly expanded dictionary with the initial seed words and an analysis of Yelp reviews in the city of Toronto, we observe significant improvements in identifying food words and gaining a deeper understanding of the food environment. CONCLUSIONS: This study proposes a novel method to expand food-related vocabulary and predict the food energy density based on machine learning and word embedding. This method makes a valuable contribution to building a more comprehensive list of food words that can be used in geography and public health studies by mining geotagged social media data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12942-023-00344-5. BioMed Central 2023-09-16 /pmc/articles/PMC10505329/ /pubmed/37716950 http://dx.doi.org/10.1186/s12942-023-00344-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Jue
Kim, Gyoorie
Chang, Kevin Chen-Chuan
Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
title Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
title_full Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
title_fullStr Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
title_full_unstemmed Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
title_short Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
title_sort empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505329/
https://www.ncbi.nlm.nih.gov/pubmed/37716950
http://dx.doi.org/10.1186/s12942-023-00344-5
work_keys_str_mv AT wangjue empoweringhealthgeographyresearchwithlocationbasedsocialmediadatainnovativefoodwordexpansionandenergydensitypredictionviawordembeddingandmachinelearning
AT kimgyoorie empoweringhealthgeographyresearchwithlocationbasedsocialmediadatainnovativefoodwordexpansionandenergydensitypredictionviawordembeddingandmachinelearning
AT changkevinchenchuan empoweringhealthgeographyresearchwithlocationbasedsocialmediadatainnovativefoodwordexpansionandenergydensitypredictionviawordembeddingandmachinelearning