Cargando…

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet str...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mo, Chen, Yin, Jingjing, Fung, Isaac Chun-Hai, Tse, Zion Tsz Ho
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700529/ https://www.ncbi.nlm.nih.gov/pubmed/34940387 http://dx.doi.org/10.3390/ejihpe11040109

_version_	1784620778887577600
author	Mo, Chen Yin, Jingjing Fung, Isaac Chun-Hai Tse, Zion Tsz Ho
author_facet	Mo, Chen Yin, Jingjing Fung, Isaac Chun-Hai Tse, Zion Tsz Ho
author_sort	Mo, Chen
collection	PubMed
description	Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents.
format	Online Article Text
id	pubmed-8700529
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87005292021-12-24 Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification Mo, Chen Yin, Jingjing Fung, Isaac Chun-Hai Tse, Zion Tsz Ho Eur J Investig Health Psychol Educ Article Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents. MDPI 2021-11-26 /pmc/articles/PMC8700529/ /pubmed/34940387 http://dx.doi.org/10.3390/ejihpe11040109 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Mo, Chen Yin, Jingjing Fung, Isaac Chun-Hai Tse, Zion Tsz Ho Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
title	Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
title_full	Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
title_fullStr	Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
title_full_unstemmed	Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
title_short	Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification
title_sort	aggregating twitter text through generalized linear regression models for tweet popularity prediction and automatic topic classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700529/ https://www.ncbi.nlm.nih.gov/pubmed/34940387 http://dx.doi.org/10.3390/ejihpe11040109
work_keys_str_mv	AT mochen aggregatingtwittertextthroughgeneralizedlinearregressionmodelsfortweetpopularitypredictionandautomatictopicclassification AT yinjingjing aggregatingtwittertextthroughgeneralizedlinearregressionmodelsfortweetpopularitypredictionandautomatictopicclassification AT fungisaacchunhai aggregatingtwittertextthroughgeneralizedlinearregressionmodelsfortweetpopularitypredictionandautomatictopicclassification AT tseziontszho aggregatingtwittertextthroughgeneralizedlinearregressionmodelsfortweetpopularitypredictionandautomatictopicclassification

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Ejemplares similares