Cargando…

Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data

OBJECTIVE: To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories. DESIGN: We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the In...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Yan, Wang, Yue, Liu, Jiahua, Tu, Zhuowen, Sun, Jian-Tao, Tsujii, Junichi, Chang, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409493/
https://www.ncbi.nlm.nih.gov/pubmed/22879758
http://dx.doi.org/10.4137/BII.S8956
_version_ 1782239595082022912
author Xu, Yan
Wang, Yue
Liu, Jiahua
Tu, Zhuowen
Sun, Jian-Tao
Tsujii, Junichi
Chang, Eric
author_facet Xu, Yan
Wang, Yue
Liu, Jiahua
Tu, Zhuowen
Sun, Jian-Tao
Tsujii, Junichi
Chang, Eric
author_sort Xu, Yan
collection PubMed
description OBJECTIVE: To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories. DESIGN: We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes. MEASUREMENTS: The performance is evaluated by the overall micro-averaged precision, recall and F-measure. RESULT: Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams. CONCLUSION: Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available.
format Online
Article
Text
id pubmed-3409493
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-34094932012-08-09 Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data Xu, Yan Wang, Yue Liu, Jiahua Tu, Zhuowen Sun, Jian-Tao Tsujii, Junichi Chang, Eric Biomed Inform Insights Original Research OBJECTIVE: To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories. DESIGN: We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes. MEASUREMENTS: The performance is evaluated by the overall micro-averaged precision, recall and F-measure. RESULT: Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams. CONCLUSION: Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available. Libertas Academica 2012-01-30 /pmc/articles/PMC3409493/ /pubmed/22879758 http://dx.doi.org/10.4137/BII.S8956 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Original Research
Xu, Yan
Wang, Yue
Liu, Jiahua
Tu, Zhuowen
Sun, Jian-Tao
Tsujii, Junichi
Chang, Eric
Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data
title Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data
title_full Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data
title_fullStr Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data
title_full_unstemmed Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data
title_short Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data
title_sort suicide note sentiment classification: a supervised approach augmented by web data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409493/
https://www.ncbi.nlm.nih.gov/pubmed/22879758
http://dx.doi.org/10.4137/BII.S8956
work_keys_str_mv AT xuyan suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata
AT wangyue suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata
AT liujiahua suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata
AT tuzhuowen suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata
AT sunjiantao suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata
AT tsujiijunichi suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata
AT changeric suicidenotesentimentclassificationasupervisedapproachaugmentedbywebdata