Cargando…
Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study
BACKGROUND: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. OBJECTI...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525005/ https://www.ncbi.nlm.nih.gov/pubmed/28694239 http://dx.doi.org/10.2196/jmir.7276 |
_version_ | 1783252569326354432 |
---|---|
author | Cheng, Qijin Li, Tim MH Kwok, Chi-Leung Zhu, Tingshao Yip, Paul SF |
author_facet | Cheng, Qijin Li, Tim MH Kwok, Chi-Leung Zhu, Tingshao Yip, Paul SF |
author_sort | Cheng, Qijin |
collection | PubMed |
description | BACKGROUND: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. OBJECTIVE: The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one’s suicide risk and emotional distress in Chinese social media. METHODS: A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants’ Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors. RESULTS: A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC. CONCLUSIONS: SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life. |
format | Online Article Text |
id | pubmed-5525005 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-55250052017-08-11 Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study Cheng, Qijin Li, Tim MH Kwok, Chi-Leung Zhu, Tingshao Yip, Paul SF J Med Internet Res Original Paper BACKGROUND: Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention. OBJECTIVE: The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one’s suicide risk and emotional distress in Chinese social media. METHODS: A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants’ Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors. RESULTS: A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC. CONCLUSIONS: SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life. JMIR Publications 2017-07-10 /pmc/articles/PMC5525005/ /pubmed/28694239 http://dx.doi.org/10.2196/jmir.7276 Text en ©Qijin Cheng, Tim MH Li, Chi-Leung Kwok, Tingshao Zhu, Paul SF Yip. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 10.07.2017. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Cheng, Qijin Li, Tim MH Kwok, Chi-Leung Zhu, Tingshao Yip, Paul SF Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study |
title | Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study |
title_full | Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study |
title_fullStr | Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study |
title_full_unstemmed | Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study |
title_short | Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study |
title_sort | assessing suicide risk and emotional distress in chinese social media: a text mining and machine learning study |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5525005/ https://www.ncbi.nlm.nih.gov/pubmed/28694239 http://dx.doi.org/10.2196/jmir.7276 |
work_keys_str_mv | AT chengqijin assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy AT litimmh assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy AT kwokchileung assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy AT zhutingshao assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy AT yippaulsf assessingsuicideriskandemotionaldistressinchinesesocialmediaatextminingandmachinelearningstudy |