Cargando…

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Albalawi, Rania, Yeap, Tet Hin, Benyoucef, Morad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861298/
https://www.ncbi.nlm.nih.gov/pubmed/33733159
http://dx.doi.org/10.3389/frai.2020.00042
_version_ 1783647056144891904
author Albalawi, Rania
Yeap, Tet Hin
Benyoucef, Morad
author_facet Albalawi, Rania
Yeap, Tet Hin
Benyoucef, Morad
author_sort Albalawi, Rania
collection PubMed
description With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision, F-score, and topic coherence. As a result, latent Dirichlet allocation and non-negative matrix factorization methods delivered more meaningful extracted topics and obtained good results. The paper sheds light on some common topic modeling methods in a short-text context and provides direction for researchers who seek to apply these methods.
format Online
Article
Text
id pubmed-7861298
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78612982021-03-16 Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis Albalawi, Rania Yeap, Tet Hin Benyoucef, Morad Front Artif Intell Artificial Intelligence With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision, F-score, and topic coherence. As a result, latent Dirichlet allocation and non-negative matrix factorization methods delivered more meaningful extracted topics and obtained good results. The paper sheds light on some common topic modeling methods in a short-text context and provides direction for researchers who seek to apply these methods. Frontiers Media S.A. 2020-07-14 /pmc/articles/PMC7861298/ /pubmed/33733159 http://dx.doi.org/10.3389/frai.2020.00042 Text en Copyright © 2020 Albalawi, Yeap and Benyoucef. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Albalawi, Rania
Yeap, Tet Hin
Benyoucef, Morad
Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
title Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
title_full Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
title_fullStr Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
title_full_unstemmed Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
title_short Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
title_sort using topic modeling methods for short-text data: a comparative analysis
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861298/
https://www.ncbi.nlm.nih.gov/pubmed/33733159
http://dx.doi.org/10.3389/frai.2020.00042
work_keys_str_mv AT albalawirania usingtopicmodelingmethodsforshorttextdataacomparativeanalysis
AT yeaptethin usingtopicmodelingmethodsforshorttextdataacomparativeanalysis
AT benyoucefmorad usingtopicmodelingmethodsforshorttextdataacomparativeanalysis