Cargando…
Microblog topic identification using Linked Open Data
Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418982/ https://www.ncbi.nlm.nih.gov/pubmed/32780736 http://dx.doi.org/10.1371/journal.pone.0236863 |
_version_ | 1783569792400097280 |
---|---|
author | Yıldırım, Ahmet Uskudarli, Suzan |
author_facet | Yıldırım, Ahmet Uskudarli, Suzan |
author_sort | Yıldırım, Ahmet |
collection | PubMed |
description | Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of the information related to subjects across several posts. Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. Such topics are used in tasks like classification and recommendations. The interpretation of topics is considered a separate task in such methods, albeit they are becoming increasingly human-interpretable. This work proposes an approach for identifying machine-interpretable topics of collective interest. We define topics as a set of related elements that are associated by having posted in the same contexts. To represent topics, we introduce an ontology specified according to the W3C recommended standards. The elements of the topics are identified via linking entities to resources published on Linked Open Data (LOD). Such representation enables processing topics to provide insights that go beyond what is explicitly expressed in the microposts. The feasibility of the proposed approach is examined by generating topics from more than one million tweets collected from Twitter during various events. The utility of these topics is demonstrated with a variety of topic-related tasks along with a comparison of the effort required to perform the same tasks with words-list-based representations. Manual evaluation of randomly selected 36 sets of topics yielded 81.0% and 93.3% for the precision and F1 scores respectively. |
format | Online Article Text |
id | pubmed-7418982 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-74189822020-08-19 Microblog topic identification using Linked Open Data Yıldırım, Ahmet Uskudarli, Suzan PLoS One Research Article Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of the information related to subjects across several posts. Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. Such topics are used in tasks like classification and recommendations. The interpretation of topics is considered a separate task in such methods, albeit they are becoming increasingly human-interpretable. This work proposes an approach for identifying machine-interpretable topics of collective interest. We define topics as a set of related elements that are associated by having posted in the same contexts. To represent topics, we introduce an ontology specified according to the W3C recommended standards. The elements of the topics are identified via linking entities to resources published on Linked Open Data (LOD). Such representation enables processing topics to provide insights that go beyond what is explicitly expressed in the microposts. The feasibility of the proposed approach is examined by generating topics from more than one million tweets collected from Twitter during various events. The utility of these topics is demonstrated with a variety of topic-related tasks along with a comparison of the effort required to perform the same tasks with words-list-based representations. Manual evaluation of randomly selected 36 sets of topics yielded 81.0% and 93.3% for the precision and F1 scores respectively. Public Library of Science 2020-08-11 /pmc/articles/PMC7418982/ /pubmed/32780736 http://dx.doi.org/10.1371/journal.pone.0236863 Text en © 2020 Yıldırım, Uskudarli http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Yıldırım, Ahmet Uskudarli, Suzan Microblog topic identification using Linked Open Data |
title | Microblog topic identification using Linked Open Data |
title_full | Microblog topic identification using Linked Open Data |
title_fullStr | Microblog topic identification using Linked Open Data |
title_full_unstemmed | Microblog topic identification using Linked Open Data |
title_short | Microblog topic identification using Linked Open Data |
title_sort | microblog topic identification using linked open data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418982/ https://www.ncbi.nlm.nih.gov/pubmed/32780736 http://dx.doi.org/10.1371/journal.pone.0236863 |
work_keys_str_mv | AT yıldırımahmet microblogtopicidentificationusinglinkedopendata AT uskudarlisuzan microblogtopicidentificationusinglinkedopendata |