Cargando…

A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts

The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short,...

Descripción completa

Detalles Bibliográficos
Autores principales: Egger, Roman, Yu, Joanne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9120935/
https://www.ncbi.nlm.nih.gov/pubmed/35602001
http://dx.doi.org/10.3389/fsoc.2022.886498
_version_ 1784711044910809088
author Egger, Roman
Yu, Joanne
author_facet Egger, Roman
Yu, Joanne
author_sort Egger, Roman
collection PubMed
description The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.
format Online
Article
Text
id pubmed-9120935
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91209352022-05-21 A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts Egger, Roman Yu, Joanne Front Sociol Sociology The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data. Frontiers Media S.A. 2022-05-06 /pmc/articles/PMC9120935/ /pubmed/35602001 http://dx.doi.org/10.3389/fsoc.2022.886498 Text en Copyright © 2022 Egger and Yu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Sociology
Egger, Roman
Yu, Joanne
A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
title A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
title_full A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
title_fullStr A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
title_full_unstemmed A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
title_short A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts
title_sort topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts
topic Sociology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9120935/
https://www.ncbi.nlm.nih.gov/pubmed/35602001
http://dx.doi.org/10.3389/fsoc.2022.886498
work_keys_str_mv AT eggerroman atopicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts
AT yujoanne atopicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts
AT eggerroman topicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts
AT yujoanne topicmodelingcomparisonbetweenldanmftop2vecandbertopictodemystifytwitterposts