Cargando…
Location Prediction for Tweets
Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geogr...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931908/ https://www.ncbi.nlm.nih.gov/pubmed/33693328 http://dx.doi.org/10.3389/fdata.2019.00005 |
_version_ | 1783660380352937984 |
---|---|
author | Huang, Chieh-Yang Tong, Hanghang He, Jingrui Maciejewski, Ross |
author_facet | Huang, Chieh-Yang Tong, Hanghang He, Jingrui Maciejewski, Ross |
author_sort | Huang, Chieh-Yang |
collection | PubMed |
description | Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches. |
format | Online Article Text |
id | pubmed-7931908 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79319082021-03-09 Location Prediction for Tweets Huang, Chieh-Yang Tong, Hanghang He, Jingrui Maciejewski, Ross Front Big Data Big Data Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches. Frontiers Media S.A. 2019-05-24 /pmc/articles/PMC7931908/ /pubmed/33693328 http://dx.doi.org/10.3389/fdata.2019.00005 Text en Copyright © 2019 Huang, Tong, He and Maciejewski. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Huang, Chieh-Yang Tong, Hanghang He, Jingrui Maciejewski, Ross Location Prediction for Tweets |
title | Location Prediction for Tweets |
title_full | Location Prediction for Tweets |
title_fullStr | Location Prediction for Tweets |
title_full_unstemmed | Location Prediction for Tweets |
title_short | Location Prediction for Tweets |
title_sort | location prediction for tweets |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931908/ https://www.ncbi.nlm.nih.gov/pubmed/33693328 http://dx.doi.org/10.3389/fdata.2019.00005 |
work_keys_str_mv | AT huangchiehyang locationpredictionfortweets AT tonghanghang locationpredictionfortweets AT hejingrui locationpredictionfortweets AT maciejewskiross locationpredictionfortweets |