Cargando…

Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study

BACKGROUND: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. OBJECTI...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Qijin, Li, Tim MH, Kwok, Chi-Leung, Zhu, Tingshao, Yip, Paul SF
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525005/
https://www.ncbi.nlm.nih.gov/pubmed/28694239
http://dx.doi.org/10.2196/jmir.7276
_version_ 1783252569326354432
author Cheng, Qijin
Li, Tim MH
Kwok, Chi-Leung
Zhu, Tingshao
Yip, Paul SF
author_facet Cheng, Qijin
Li, Tim MH
Kwok, Chi-Leung
Zhu, Tingshao
Yip, Paul SF
author_sort Cheng, Qijin
collection PubMed
description BACKGROUND: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. OBJECTIVE: The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one’s suicide risk and emotional distress in Chinese social media. METHODS: A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants’ Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors. RESULTS: A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC. CONCLUSIONS: SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life.
format Online
Article
Text
id pubmed-5525005
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-55250052017-08-11 Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study Cheng, Qijin Li, Tim MH Kwok, Chi-Leung Zhu, Tingshao Yip, Paul SF J Med Internet Res Original Paper BACKGROUND: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. OBJECTIVE: The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one’s suicide risk and emotional distress in Chinese social media. METHODS: A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants’ Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors. RESULTS: A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC. CONCLUSIONS: SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life. JMIR Publications 2017-07-10 /pmc/articles/PMC5525005/ /pubmed/28694239 http://dx.doi.org/10.2196/jmir.7276 Text en ©Qijin Cheng, Tim MH Li, Chi-Leung Kwok, Tingshao Zhu, Paul SF Yip. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 10.07.2017. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Cheng, Qijin
Li, Tim MH
Kwok, Chi-Leung
Zhu, Tingshao
Yip, Paul SF
Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
title Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
title_full Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
title_fullStr Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
title_full_unstemmed Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
title_short Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
title_sort assessing suicide risk and emotional distress in chinese social media: a text mining and machine learning study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525005/
https://www.ncbi.nlm.nih.gov/pubmed/28694239
http://dx.doi.org/10.2196/jmir.7276
work_keys_str_mv AT chengqijin assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy
AT litimmh assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy
AT kwokchileung assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy
AT zhutingshao assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy
AT yippaulsf assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy