Cargando…

An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter

Social media such as Twitter connect billions of people by allowing them to exchange their thoughts via short-text communication. Topic modelling is a widely used technique for analysing short texts. Discovering topic clusters in short-text collections faces issues with distance-based, density-based...

Descripción completa

Detalles Bibliográficos
Autores principales: Athukorala, Shalani, Mohotti, Wathsala
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Vienna 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9309003/
https://www.ncbi.nlm.nih.gov/pubmed/35911485
http://dx.doi.org/10.1007/s13278-022-00898-5
_version_ 1784753063496515584
author Athukorala, Shalani
Mohotti, Wathsala
author_facet Athukorala, Shalani
Mohotti, Wathsala
author_sort Athukorala, Shalani
collection PubMed
description Social media such as Twitter connect billions of people by allowing them to exchange their thoughts via short-text communication. Topic modelling is a widely used technique for analysing short texts. Discovering topic clusters in short-text collections faces issues with distance-based, density-based and dimensionality reduction-based methods due to their higher dimensionality and short length which results in extremely sparse text representation matrices. We propose the ‘neighbourhood-based assistance’-driven non-negative matrix factorization (NMF) method to handle high-dimensional sparse short-text representation with lower-dimensional projection effectively. We utilized NMF that aligned with the natural non-negativity of text data coupled with the symmetric document affinity information to identify topic distribution in the short text. Neighbourhood information within documents is captured using Jaccard similarity to assist information loss, resulting in higher-to-lower-dimensional projection. Experimental results with Twitter data sets show that the proposed approach is able to attain high accuracy compared to state-of-the-art methods quantitatively, while qualitative analysis with case studies validates the ability of the proposed approach in generating meaningful topic clusters.
format Online
Article
Text
id pubmed-9309003
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Vienna
record_format MEDLINE/PubMed
spelling pubmed-93090032022-07-25 An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter Athukorala, Shalani Mohotti, Wathsala Soc Netw Anal Min Original Article Social media such as Twitter connect billions of people by allowing them to exchange their thoughts via short-text communication. Topic modelling is a widely used technique for analysing short texts. Discovering topic clusters in short-text collections faces issues with distance-based, density-based and dimensionality reduction-based methods due to their higher dimensionality and short length which results in extremely sparse text representation matrices. We propose the ‘neighbourhood-based assistance’-driven non-negative matrix factorization (NMF) method to handle high-dimensional sparse short-text representation with lower-dimensional projection effectively. We utilized NMF that aligned with the natural non-negativity of text data coupled with the symmetric document affinity information to identify topic distribution in the short text. Neighbourhood information within documents is captured using Jaccard similarity to assist information loss, resulting in higher-to-lower-dimensional projection. Experimental results with Twitter data sets show that the proposed approach is able to attain high accuracy compared to state-of-the-art methods quantitatively, while qualitative analysis with case studies validates the ability of the proposed approach in generating meaningful topic clusters. Springer Vienna 2022-07-24 2022 /pmc/articles/PMC9309003/ /pubmed/35911485 http://dx.doi.org/10.1007/s13278-022-00898-5 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Athukorala, Shalani
Mohotti, Wathsala
An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
title An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
title_full An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
title_fullStr An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
title_full_unstemmed An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
title_short An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
title_sort effective short-text topic modelling with neighbourhood assistance-driven nmf in twitter
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9309003/
https://www.ncbi.nlm.nih.gov/pubmed/35911485
http://dx.doi.org/10.1007/s13278-022-00898-5
work_keys_str_mv AT athukoralashalani aneffectiveshorttexttopicmodellingwithneighbourhoodassistancedrivennmfintwitter
AT mohottiwathsala aneffectiveshorttexttopicmodellingwithneighbourhoodassistancedrivennmfintwitter
AT athukoralashalani effectiveshorttexttopicmodellingwithneighbourhoodassistancedrivennmfintwitter
AT mohottiwathsala effectiveshorttexttopicmodellingwithneighbourhoodassistancedrivennmfintwitter