Cargando…

Feature engineering for sentiment analysis in e-health forums

INTRODUCTION: Exploiting information in health-related social media services is of great interest for patients, researchers and medical companies. The challenge is, however, to provide easy, quick and relevant access to the vast amount of information that is available. One step towards facilitating...

Descripción completa

Detalles Bibliográficos
Autores principales:	Carrillo-de-Albornoz, Jorge, Rodríguez Vidal, Javier, Plaza, Laura
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6264154/ https://www.ncbi.nlm.nih.gov/pubmed/30496232 http://dx.doi.org/10.1371/journal.pone.0207996

_version_	1783375431295041536
author	Carrillo-de-Albornoz, Jorge Rodríguez Vidal, Javier Plaza, Laura
author_facet	Carrillo-de-Albornoz, Jorge Rodríguez Vidal, Javier Plaza, Laura
author_sort	Carrillo-de-Albornoz, Jorge
collection	PubMed
description	INTRODUCTION: Exploiting information in health-related social media services is of great interest for patients, researchers and medical companies. The challenge is, however, to provide easy, quick and relevant access to the vast amount of information that is available. One step towards facilitating information access to online health data is opinion mining. Even though the classification of patient opinions into positive and negative has been previously tackled, most works make use of machine learning methods and bags of words. Our first contribution is an extensive evaluation of different features, including lexical, syntactic, semantic, network-based, sentiment-based and word embeddings features to represent patient-authored texts for polarity classification. The second contribution of this work is the study of polar facts (i.e. objective information with polar connotations). Traditionally, the presence of polar facts has been neglected and research in polarity classification has been bounded to opinionated texts. We demonstrate the existence and importance of polar facts for the polarity classification of health information. MATERIAL AND METHODS: We annotate a set of more than 3500 posts to online health forums of breast cancer, crohn and different allergies, respectively. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”, and as “positive”, “negative” and “neutral”. Using this data, we train different machine learning algorithms and compare traditional bags of words representations with word embeddings in combination with lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-authored contents into positive, negative and neutral. Beside, we experiment with a combination of textual and semantic representations by generating concept embeddings using the UMLS Metathesaurus. RESULTS: We reach two main results: first, we find that it is possible to predict polarity of patient-authored contents with a very high accuracy (≈ 70 percent) using word embeddings, and that this considerably outperforms more traditional representations like bags of words; and second, when dealing with medical information, negative and positive facts (i.e. objective information) are nearly as frequent as negative and positive opinions and experiences (i.e. subjective information), and their importance for polarity classification is crucial.
format	Online Article Text
id	pubmed-6264154
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-62641542018-12-19 Feature engineering for sentiment analysis in e-health forums Carrillo-de-Albornoz, Jorge Rodríguez Vidal, Javier Plaza, Laura PLoS One Research Article INTRODUCTION: Exploiting information in health-related social media services is of great interest for patients, researchers and medical companies. The challenge is, however, to provide easy, quick and relevant access to the vast amount of information that is available. One step towards facilitating information access to online health data is opinion mining. Even though the classification of patient opinions into positive and negative has been previously tackled, most works make use of machine learning methods and bags of words. Our first contribution is an extensive evaluation of different features, including lexical, syntactic, semantic, network-based, sentiment-based and word embeddings features to represent patient-authored texts for polarity classification. The second contribution of this work is the study of polar facts (i.e. objective information with polar connotations). Traditionally, the presence of polar facts has been neglected and research in polarity classification has been bounded to opinionated texts. We demonstrate the existence and importance of polar facts for the polarity classification of health information. MATERIAL AND METHODS: We annotate a set of more than 3500 posts to online health forums of breast cancer, crohn and different allergies, respectively. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”, and as “positive”, “negative” and “neutral”. Using this data, we train different machine learning algorithms and compare traditional bags of words representations with word embeddings in combination with lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-authored contents into positive, negative and neutral. Beside, we experiment with a combination of textual and semantic representations by generating concept embeddings using the UMLS Metathesaurus. RESULTS: We reach two main results: first, we find that it is possible to predict polarity of patient-authored contents with a very high accuracy (≈ 70 percent) using word embeddings, and that this considerably outperforms more traditional representations like bags of words; and second, when dealing with medical information, negative and positive facts (i.e. objective information) are nearly as frequent as negative and positive opinions and experiences (i.e. subjective information), and their importance for polarity classification is crucial. Public Library of Science 2018-11-29 /pmc/articles/PMC6264154/ /pubmed/30496232 http://dx.doi.org/10.1371/journal.pone.0207996 Text en © 2018 Carrillo-de-Albornoz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Carrillo-de-Albornoz, Jorge Rodríguez Vidal, Javier Plaza, Laura Feature engineering for sentiment analysis in e-health forums
title	Feature engineering for sentiment analysis in e-health forums
title_full	Feature engineering for sentiment analysis in e-health forums
title_fullStr	Feature engineering for sentiment analysis in e-health forums
title_full_unstemmed	Feature engineering for sentiment analysis in e-health forums
title_short	Feature engineering for sentiment analysis in e-health forums
title_sort	feature engineering for sentiment analysis in e-health forums
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6264154/ https://www.ncbi.nlm.nih.gov/pubmed/30496232 http://dx.doi.org/10.1371/journal.pone.0207996
work_keys_str_mv	AT carrillodealbornozjorge featureengineeringforsentimentanalysisinehealthforums AT rodriguezvidaljavier featureengineeringforsentimentanalysisinehealthforums AT plazalaura featureengineeringforsentimentanalysisinehealthforums

Feature engineering for sentiment analysis in e-health forums

Ejemplares similares