Cargando…
Identifying Topics in Microblogs Using Wikipedia
Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4798765/ https://www.ncbi.nlm.nih.gov/pubmed/26991442 http://dx.doi.org/10.1371/journal.pone.0151885 |
_version_ | 1782422218018390016 |
---|---|
author | Yıldırım, Ahmet Üsküdarlı, Suzan Özgür, Arzucan |
author_facet | Yıldırım, Ahmet Üsküdarlı, Suzan Özgür, Arzucan |
author_sort | Yıldırım, Ahmet |
collection | PubMed |
description | Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public opinion mining, marketing, entertainment, and disaster management. Towards this end, approaches to relate single or partial posts to knowledge base items have been proposed. However, in microblogging systems like Twitter, topics emerge from the culmination of a large number of contributions. Therefore, identifying topics based on collections of posts, where individual posts contribute to some aspect of the greater topic is necessary. Models, such as Latent Dirichlet Allocation (LDA), propose algorithms for relating collections of posts to sets of keywords that represent underlying topics. In these approaches, figuring out what the specific topic(s) the keyword sets represent remains as a separate task. Another issue in topic detection is the scope, which is often limited to specific domain, such as health. This work proposes an approach for identifying domain-independent specific topics related to sets of posts. In this approach, individual posts are processed and then aggregated to identify key tokens, which are then mapped to specific topics. Wikipedia article titles are selected to represent topics, since they are up to date, user-generated, sophisticated articles that span topics of human interest. This paper describes the proposed approach, a prototype implementation, and a case study based on data gathered during the heavily contributed periods corresponding to the four US election debates in 2012. The manually evaluated results (0.96 precision) and other observations from the study are discussed in detail. |
format | Online Article Text |
id | pubmed-4798765 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-47987652016-03-23 Identifying Topics in Microblogs Using Wikipedia Yıldırım, Ahmet Üsküdarlı, Suzan Özgür, Arzucan PLoS One Research Article Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public opinion mining, marketing, entertainment, and disaster management. Towards this end, approaches to relate single or partial posts to knowledge base items have been proposed. However, in microblogging systems like Twitter, topics emerge from the culmination of a large number of contributions. Therefore, identifying topics based on collections of posts, where individual posts contribute to some aspect of the greater topic is necessary. Models, such as Latent Dirichlet Allocation (LDA), propose algorithms for relating collections of posts to sets of keywords that represent underlying topics. In these approaches, figuring out what the specific topic(s) the keyword sets represent remains as a separate task. Another issue in topic detection is the scope, which is often limited to specific domain, such as health. This work proposes an approach for identifying domain-independent specific topics related to sets of posts. In this approach, individual posts are processed and then aggregated to identify key tokens, which are then mapped to specific topics. Wikipedia article titles are selected to represent topics, since they are up to date, user-generated, sophisticated articles that span topics of human interest. This paper describes the proposed approach, a prototype implementation, and a case study based on data gathered during the heavily contributed periods corresponding to the four US election debates in 2012. The manually evaluated results (0.96 precision) and other observations from the study are discussed in detail. Public Library of Science 2016-03-18 /pmc/articles/PMC4798765/ /pubmed/26991442 http://dx.doi.org/10.1371/journal.pone.0151885 Text en © 2016 Yıldırım et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Yıldırım, Ahmet Üsküdarlı, Suzan Özgür, Arzucan Identifying Topics in Microblogs Using Wikipedia |
title | Identifying Topics in Microblogs Using Wikipedia |
title_full | Identifying Topics in Microblogs Using Wikipedia |
title_fullStr | Identifying Topics in Microblogs Using Wikipedia |
title_full_unstemmed | Identifying Topics in Microblogs Using Wikipedia |
title_short | Identifying Topics in Microblogs Using Wikipedia |
title_sort | identifying topics in microblogs using wikipedia |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4798765/ https://www.ncbi.nlm.nih.gov/pubmed/26991442 http://dx.doi.org/10.1371/journal.pone.0151885 |
work_keys_str_mv | AT yıldırımahmet identifyingtopicsinmicroblogsusingwikipedia AT uskudarlısuzan identifyingtopicsinmicroblogsusingwikipedia AT ozgurarzucan identifyingtopicsinmicroblogsusingwikipedia |