Cargando…

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

BACKGROUND: Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Sicheng, Zhao, Yunpeng, Bian, Jiang, Haynos, Ann F, Zhang, Rui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7665945/ https://www.ncbi.nlm.nih.gov/pubmed/33124997 http://dx.doi.org/10.2196/18273

_version_	1783610058102276096
author	Zhou, Sicheng Zhao, Yunpeng Bian, Jiang Haynos, Ann F Zhang, Rui
author_facet	Zhou, Sicheng Zhao, Yunpeng Bian, Jiang Haynos, Ann F Zhang, Rui
author_sort	Zhou, Sicheng
collection	PubMed
description	BACKGROUND: Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. OBJECTIVE: This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. METHODS: We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. RESULTS: A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F(1) score=0.89) and then promotional versus published by laypeople (F(1) score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. CONCLUSIONS: A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders.
format	Online Article Text
id	pubmed-7665945
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-76659452020-11-19 Exploring Eating Disorder Topics on Twitter: Machine Learning Approach Zhou, Sicheng Zhao, Yunpeng Bian, Jiang Haynos, Ann F Zhang, Rui JMIR Med Inform Original Paper BACKGROUND: Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. OBJECTIVE: This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. METHODS: We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. RESULTS: A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F(1) score=0.89) and then promotional versus published by laypeople (F(1) score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. CONCLUSIONS: A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders. JMIR Publications 2020-10-30 /pmc/articles/PMC7665945/ /pubmed/33124997 http://dx.doi.org/10.2196/18273 Text en ©Sicheng Zhou, Yunpeng Zhao, Jiang Bian, Ann F Haynos, Rui Zhang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 30.10.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Zhou, Sicheng Zhao, Yunpeng Bian, Jiang Haynos, Ann F Zhang, Rui Exploring Eating Disorder Topics on Twitter: Machine Learning Approach
title	Exploring Eating Disorder Topics on Twitter: Machine Learning Approach
title_full	Exploring Eating Disorder Topics on Twitter: Machine Learning Approach
title_fullStr	Exploring Eating Disorder Topics on Twitter: Machine Learning Approach
title_full_unstemmed	Exploring Eating Disorder Topics on Twitter: Machine Learning Approach
title_short	Exploring Eating Disorder Topics on Twitter: Machine Learning Approach
title_sort	exploring eating disorder topics on twitter: machine learning approach
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7665945/ https://www.ncbi.nlm.nih.gov/pubmed/33124997 http://dx.doi.org/10.2196/18273
work_keys_str_mv	AT zhousicheng exploringeatingdisordertopicsontwittermachinelearningapproach AT zhaoyunpeng exploringeatingdisordertopicsontwittermachinelearningapproach AT bianjiang exploringeatingdisordertopicsontwittermachinelearningapproach AT haynosannf exploringeatingdisordertopicsontwittermachinelearningapproach AT zhangrui exploringeatingdisordertopicsontwittermachinelearningapproach

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

Ejemplares similares