Cargando…

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, s...

Descripción completa

Detalles Bibliográficos
Autores principales: Murshed, Belal Abdullah Hezam, Mallappa, Suresha, Abawajy, Jemal, Saif, Mufeed Ahmed Naji, Al-ariki, Hasib Daowd Esmail, Abdulwahab, Hudhaifa Mohammed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9607740/
https://www.ncbi.nlm.nih.gov/pubmed/36320612
http://dx.doi.org/10.1007/s10462-022-10254-w
_version_ 1784818619894464512
author Murshed, Belal Abdullah Hezam
Mallappa, Suresha
Abawajy, Jemal
Saif, Mufeed Ahmed Naji
Al-ariki, Hasib Daowd Esmail
Abdulwahab, Hudhaifa Mohammed
author_facet Murshed, Belal Abdullah Hezam
Mallappa, Suresha
Abawajy, Jemal
Saif, Mufeed Ahmed Naji
Al-ariki, Hasib Daowd Esmail
Abdulwahab, Hudhaifa Mohammed
author_sort Murshed, Belal Abdullah Hezam
collection PubMed
description Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling.
format Online
Article
Text
id pubmed-9607740
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-96077402022-10-28 Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis Murshed, Belal Abdullah Hezam Mallappa, Suresha Abawajy, Jemal Saif, Mufeed Ahmed Naji Al-ariki, Hasib Daowd Esmail Abdulwahab, Hudhaifa Mohammed Artif Intell Rev Article Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling. Springer Netherlands 2022-10-26 2023 /pmc/articles/PMC9607740/ /pubmed/36320612 http://dx.doi.org/10.1007/s10462-022-10254-w Text en © The Author(s), under exclusive licence to Springer Nature B.V. 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Murshed, Belal Abdullah Hezam
Mallappa, Suresha
Abawajy, Jemal
Saif, Mufeed Ahmed Naji
Al-ariki, Hasib Daowd Esmail
Abdulwahab, Hudhaifa Mohammed
Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
title Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
title_full Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
title_fullStr Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
title_full_unstemmed Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
title_short Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
title_sort short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9607740/
https://www.ncbi.nlm.nih.gov/pubmed/36320612
http://dx.doi.org/10.1007/s10462-022-10254-w
work_keys_str_mv AT murshedbelalabdullahhezam shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis
AT mallappasuresha shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis
AT abawajyjemal shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis
AT saifmufeedahmednaji shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis
AT alarikihasibdaowdesmail shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis
AT abdulwahabhudhaifamohammed shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis