Cargando…
Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis
Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, s...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9607740/ https://www.ncbi.nlm.nih.gov/pubmed/36320612 http://dx.doi.org/10.1007/s10462-022-10254-w |
_version_ | 1784818619894464512 |
---|---|
author | Murshed, Belal Abdullah Hezam Mallappa, Suresha Abawajy, Jemal Saif, Mufeed Ahmed Naji Al-ariki, Hasib Daowd Esmail Abdulwahab, Hudhaifa Mohammed |
author_facet | Murshed, Belal Abdullah Hezam Mallappa, Suresha Abawajy, Jemal Saif, Mufeed Ahmed Naji Al-ariki, Hasib Daowd Esmail Abdulwahab, Hudhaifa Mohammed |
author_sort | Murshed, Belal Abdullah Hezam |
collection | PubMed |
description | Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling. |
format | Online Article Text |
id | pubmed-9607740 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-96077402022-10-28 Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis Murshed, Belal Abdullah Hezam Mallappa, Suresha Abawajy, Jemal Saif, Mufeed Ahmed Naji Al-ariki, Hasib Daowd Esmail Abdulwahab, Hudhaifa Mohammed Artif Intell Rev Article Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling. Springer Netherlands 2022-10-26 2023 /pmc/articles/PMC9607740/ /pubmed/36320612 http://dx.doi.org/10.1007/s10462-022-10254-w Text en © The Author(s), under exclusive licence to Springer Nature B.V. 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Murshed, Belal Abdullah Hezam Mallappa, Suresha Abawajy, Jemal Saif, Mufeed Ahmed Naji Al-ariki, Hasib Daowd Esmail Abdulwahab, Hudhaifa Mohammed Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
title | Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
title_full | Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
title_fullStr | Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
title_full_unstemmed | Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
title_short | Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
title_sort | short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9607740/ https://www.ncbi.nlm.nih.gov/pubmed/36320612 http://dx.doi.org/10.1007/s10462-022-10254-w |
work_keys_str_mv | AT murshedbelalabdullahhezam shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis AT mallappasuresha shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis AT abawajyjemal shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis AT saifmufeedahmednaji shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis AT alarikihasibdaowdesmail shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis AT abdulwahabhudhaifamohammed shorttexttopicmodellingapproachesinthecontextofbigdatataxonomysurveyandanalysis |