Cargando…

Beyond opinion classification: Extracting facts, opinions and experiences from health forums

INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective informati...

Descripción completa

Detalles Bibliográficos
Autores principales: Carrillo-de-Albornoz, Jorge, Aker, Ahmet, Kurtic, Emina, Plaza, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6326476/
https://www.ncbi.nlm.nih.gov/pubmed/30625206
http://dx.doi.org/10.1371/journal.pone.0209961
_version_ 1783386304865632256
author Carrillo-de-Albornoz, Jorge
Aker, Ahmet
Kurtic, Emina
Plaza, Laura
author_facet Carrillo-de-Albornoz, Jorge
Aker, Ahmet
Kurtic, Emina
Plaza, Laura
author_sort Carrillo-de-Albornoz, Jorge
collection PubMed
description INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: “experiences”, “facts” and “opinions”, using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers. MATERIAL AND METHODS: We work with a set of 3000 posts to online health forums in breast cancer, morbus crohn and different allergies. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”. Using this data, we train a support vector machine algorithm to perform classification. The results are evaluated in a 10-fold cross validation procedure. RESULTS: Overall, we find that it is possible to predict the type of information contained in a forum post with a very high accuracy (over 80 percent) using simple text representations such as word embeddings and bags of words. We also analyze more complex features such as those based on the network properties, the polarity of words and the verbal tense of the sentences and show that, when combined with the previous ones, they can boost the results.
format Online
Article
Text
id pubmed-6326476
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63264762019-01-19 Beyond opinion classification: Extracting facts, opinions and experiences from health forums Carrillo-de-Albornoz, Jorge Aker, Ahmet Kurtic, Emina Plaza, Laura PLoS One Research Article INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: “experiences”, “facts” and “opinions”, using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers. MATERIAL AND METHODS: We work with a set of 3000 posts to online health forums in breast cancer, morbus crohn and different allergies. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”. Using this data, we train a support vector machine algorithm to perform classification. The results are evaluated in a 10-fold cross validation procedure. RESULTS: Overall, we find that it is possible to predict the type of information contained in a forum post with a very high accuracy (over 80 percent) using simple text representations such as word embeddings and bags of words. We also analyze more complex features such as those based on the network properties, the polarity of words and the verbal tense of the sentences and show that, when combined with the previous ones, they can boost the results. Public Library of Science 2019-01-09 /pmc/articles/PMC6326476/ /pubmed/30625206 http://dx.doi.org/10.1371/journal.pone.0209961 Text en © 2019 Carrillo-de-Albornoz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Carrillo-de-Albornoz, Jorge
Aker, Ahmet
Kurtic, Emina
Plaza, Laura
Beyond opinion classification: Extracting facts, opinions and experiences from health forums
title Beyond opinion classification: Extracting facts, opinions and experiences from health forums
title_full Beyond opinion classification: Extracting facts, opinions and experiences from health forums
title_fullStr Beyond opinion classification: Extracting facts, opinions and experiences from health forums
title_full_unstemmed Beyond opinion classification: Extracting facts, opinions and experiences from health forums
title_short Beyond opinion classification: Extracting facts, opinions and experiences from health forums
title_sort beyond opinion classification: extracting facts, opinions and experiences from health forums
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6326476/
https://www.ncbi.nlm.nih.gov/pubmed/30625206
http://dx.doi.org/10.1371/journal.pone.0209961
work_keys_str_mv AT carrillodealbornozjorge beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums
AT akerahmet beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums
AT kurticemina beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums
AT plazalaura beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums