Cargando…

Microblog topic identification using Linked Open Data

Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of...

Descripción completa

Detalles Bibliográficos
Autores principales: Yıldırım, Ahmet, Uskudarli, Suzan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418982/
https://www.ncbi.nlm.nih.gov/pubmed/32780736
http://dx.doi.org/10.1371/journal.pone.0236863
_version_ 1783569792400097280
author Yıldırım, Ahmet
Uskudarli, Suzan
author_facet Yıldırım, Ahmet
Uskudarli, Suzan
author_sort Yıldırım, Ahmet
collection PubMed
description Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of the information related to subjects across several posts. Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. Such topics are used in tasks like classification and recommendations. The interpretation of topics is considered a separate task in such methods, albeit they are becoming increasingly human-interpretable. This work proposes an approach for identifying machine-interpretable topics of collective interest. We define topics as a set of related elements that are associated by having posted in the same contexts. To represent topics, we introduce an ontology specified according to the W3C recommended standards. The elements of the topics are identified via linking entities to resources published on Linked Open Data (LOD). Such representation enables processing topics to provide insights that go beyond what is explicitly expressed in the microposts. The feasibility of the proposed approach is examined by generating topics from more than one million tweets collected from Twitter during various events. The utility of these topics is demonstrated with a variety of topic-related tasks along with a comparison of the effort required to perform the same tasks with words-list-based representations. Manual evaluation of randomly selected 36 sets of topics yielded 81.0% and 93.3% for the precision and F1 scores respectively.
format Online
Article
Text
id pubmed-7418982
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-74189822020-08-19 Microblog topic identification using Linked Open Data Yıldırım, Ahmet Uskudarli, Suzan PLoS One Research Article Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of the information related to subjects across several posts. Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. Such topics are used in tasks like classification and recommendations. The interpretation of topics is considered a separate task in such methods, albeit they are becoming increasingly human-interpretable. This work proposes an approach for identifying machine-interpretable topics of collective interest. We define topics as a set of related elements that are associated by having posted in the same contexts. To represent topics, we introduce an ontology specified according to the W3C recommended standards. The elements of the topics are identified via linking entities to resources published on Linked Open Data (LOD). Such representation enables processing topics to provide insights that go beyond what is explicitly expressed in the microposts. The feasibility of the proposed approach is examined by generating topics from more than one million tweets collected from Twitter during various events. The utility of these topics is demonstrated with a variety of topic-related tasks along with a comparison of the effort required to perform the same tasks with words-list-based representations. Manual evaluation of randomly selected 36 sets of topics yielded 81.0% and 93.3% for the precision and F1 scores respectively. Public Library of Science 2020-08-11 /pmc/articles/PMC7418982/ /pubmed/32780736 http://dx.doi.org/10.1371/journal.pone.0236863 Text en © 2020 Yıldırım, Uskudarli http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yıldırım, Ahmet
Uskudarli, Suzan
Microblog topic identification using Linked Open Data
title Microblog topic identification using Linked Open Data
title_full Microblog topic identification using Linked Open Data
title_fullStr Microblog topic identification using Linked Open Data
title_full_unstemmed Microblog topic identification using Linked Open Data
title_short Microblog topic identification using Linked Open Data
title_sort microblog topic identification using linked open data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418982/
https://www.ncbi.nlm.nih.gov/pubmed/32780736
http://dx.doi.org/10.1371/journal.pone.0236863
work_keys_str_mv AT yıldırımahmet microblogtopicidentificationusinglinkedopendata
AT uskudarlisuzan microblogtopicidentificationusinglinkedopendata