Cargando…
A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short,...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9120935/ https://www.ncbi.nlm.nih.gov/pubmed/35602001 http://dx.doi.org/10.3389/fsoc.2022.886498 |
_version_ | 1784711044910809088 |
---|---|
author | Egger, Roman Yu, Joanne |
author_facet | Egger, Roman Yu, Joanne |
author_sort | Egger, Roman |
collection | PubMed |
description | The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data. |
format | Online Article Text |
id | pubmed-9120935 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-91209352022-05-21 A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts Egger, Roman Yu, Joanne Front Sociol Sociology The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data. Frontiers Media S.A. 2022-05-06 /pmc/articles/PMC9120935/ /pubmed/35602001 http://dx.doi.org/10.3389/fsoc.2022.886498 Text en Copyright © 2022 Egger and Yu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Sociology Egger, Roman Yu, Joanne A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts |
title | A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts |
title_full | A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts |
title_fullStr | A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts |
title_full_unstemmed | A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts |
title_short | A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts |
title_sort | topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts |
topic | Sociology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9120935/ https://www.ncbi.nlm.nih.gov/pubmed/35602001 http://dx.doi.org/10.3389/fsoc.2022.886498 |
work_keys_str_mv | AT eggerroman atopicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts AT yujoanne atopicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts AT eggerroman topicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts AT yujoanne topicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts |