Cargando…

Location Prediction for Tweets

Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geogr...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Chieh-Yang, Tong, Hanghang, He, Jingrui, Maciejewski, Ross
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931908/
https://www.ncbi.nlm.nih.gov/pubmed/33693328
http://dx.doi.org/10.3389/fdata.2019.00005
_version_ 1783660380352937984
author Huang, Chieh-Yang
Tong, Hanghang
He, Jingrui
Maciejewski, Ross
author_facet Huang, Chieh-Yang
Tong, Hanghang
He, Jingrui
Maciejewski, Ross
author_sort Huang, Chieh-Yang
collection PubMed
description Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.
format Online
Article
Text
id pubmed-7931908
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79319082021-03-09 Location Prediction for Tweets Huang, Chieh-Yang Tong, Hanghang He, Jingrui Maciejewski, Ross Front Big Data Big Data Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches. Frontiers Media S.A. 2019-05-24 /pmc/articles/PMC7931908/ /pubmed/33693328 http://dx.doi.org/10.3389/fdata.2019.00005 Text en Copyright © 2019 Huang, Tong, He and Maciejewski. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Huang, Chieh-Yang
Tong, Hanghang
He, Jingrui
Maciejewski, Ross
Location Prediction for Tweets
title Location Prediction for Tweets
title_full Location Prediction for Tweets
title_fullStr Location Prediction for Tweets
title_full_unstemmed Location Prediction for Tweets
title_short Location Prediction for Tweets
title_sort location prediction for tweets
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931908/
https://www.ncbi.nlm.nih.gov/pubmed/33693328
http://dx.doi.org/10.3389/fdata.2019.00005
work_keys_str_mv AT huangchiehyang locationpredictionfortweets
AT tonghanghang locationpredictionfortweets
AT hejingrui locationpredictionfortweets
AT maciejewskiross locationpredictionfortweets