Cargando…

Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study

BACKGROUND: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Mea...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Genghao, Li, Bing, Huang, Langlin, Hou, Sibing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7381008/
https://www.ncbi.nlm.nih.gov/pubmed/32574151
http://dx.doi.org/10.2196/17650
_version_ 1783562953906192384
author Li, Genghao
Li, Bing
Huang, Langlin
Hou, Sibing
author_facet Li, Genghao
Li, Bing
Huang, Langlin
Hou, Sibing
author_sort Li, Genghao
collection PubMed
description BACKGROUND: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. OBJECTIVE: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. METHODS: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. RESULTS: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. CONCLUSIONS: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods.
format Online
Article
Text
id pubmed-7381008
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-73810082020-08-06 Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study Li, Genghao Li, Bing Huang, Langlin Hou, Sibing JMIR Med Inform Original Paper BACKGROUND: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. OBJECTIVE: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. METHODS: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. RESULTS: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. CONCLUSIONS: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods. JMIR Publications 2020-06-23 /pmc/articles/PMC7381008/ /pubmed/32574151 http://dx.doi.org/10.2196/17650 Text en ©Genghao Li, Bing Li, Langlin Huang, Sibing Hou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 23.06.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Li, Genghao
Li, Bing
Huang, Langlin
Hou, Sibing
Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
title Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
title_full Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
title_fullStr Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
title_full_unstemmed Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
title_short Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study
title_sort automatic construction of a depression-domain lexicon based on microblogs: text mining study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7381008/
https://www.ncbi.nlm.nih.gov/pubmed/32574151
http://dx.doi.org/10.2196/17650
work_keys_str_mv AT ligenghao automaticconstructionofadepressiondomainlexiconbasedonmicroblogstextminingstudy
AT libing automaticconstructionofadepressiondomainlexiconbasedonmicroblogstextminingstudy
AT huanglanglin automaticconstructionofadepressiondomainlexiconbasedonmicroblogstextminingstudy
AT housibing automaticconstructionofadepressiondomainlexiconbasedonmicroblogstextminingstudy