Cargando…
Beyond opinion classification: Extracting facts, opinions and experiences from health forums
INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective informati...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6326476/ https://www.ncbi.nlm.nih.gov/pubmed/30625206 http://dx.doi.org/10.1371/journal.pone.0209961 |
_version_ | 1783386304865632256 |
---|---|
author | Carrillo-de-Albornoz, Jorge Aker, Ahmet Kurtic, Emina Plaza, Laura |
author_facet | Carrillo-de-Albornoz, Jorge Aker, Ahmet Kurtic, Emina Plaza, Laura |
author_sort | Carrillo-de-Albornoz, Jorge |
collection | PubMed |
description | INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: “experiences”, “facts” and “opinions”, using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers. MATERIAL AND METHODS: We work with a set of 3000 posts to online health forums in breast cancer, morbus crohn and different allergies. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”. Using this data, we train a support vector machine algorithm to perform classification. The results are evaluated in a 10-fold cross validation procedure. RESULTS: Overall, we find that it is possible to predict the type of information contained in a forum post with a very high accuracy (over 80 percent) using simple text representations such as word embeddings and bags of words. We also analyze more complex features such as those based on the network properties, the polarity of words and the verbal tense of the sentences and show that, when combined with the previous ones, they can boost the results. |
format | Online Article Text |
id | pubmed-6326476 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-63264762019-01-19 Beyond opinion classification: Extracting facts, opinions and experiences from health forums Carrillo-de-Albornoz, Jorge Aker, Ahmet Kurtic, Emina Plaza, Laura PLoS One Research Article INTRODUCTION: Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: “experiences”, “facts” and “opinions”, using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers. MATERIAL AND METHODS: We work with a set of 3000 posts to online health forums in breast cancer, morbus crohn and different allergies. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”. Using this data, we train a support vector machine algorithm to perform classification. The results are evaluated in a 10-fold cross validation procedure. RESULTS: Overall, we find that it is possible to predict the type of information contained in a forum post with a very high accuracy (over 80 percent) using simple text representations such as word embeddings and bags of words. We also analyze more complex features such as those based on the network properties, the polarity of words and the verbal tense of the sentences and show that, when combined with the previous ones, they can boost the results. Public Library of Science 2019-01-09 /pmc/articles/PMC6326476/ /pubmed/30625206 http://dx.doi.org/10.1371/journal.pone.0209961 Text en © 2019 Carrillo-de-Albornoz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Carrillo-de-Albornoz, Jorge Aker, Ahmet Kurtic, Emina Plaza, Laura Beyond opinion classification: Extracting facts, opinions and experiences from health forums |
title | Beyond opinion classification: Extracting facts, opinions and experiences from health forums |
title_full | Beyond opinion classification: Extracting facts, opinions and experiences from health forums |
title_fullStr | Beyond opinion classification: Extracting facts, opinions and experiences from health forums |
title_full_unstemmed | Beyond opinion classification: Extracting facts, opinions and experiences from health forums |
title_short | Beyond opinion classification: Extracting facts, opinions and experiences from health forums |
title_sort | beyond opinion classification: extracting facts, opinions and experiences from health forums |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6326476/ https://www.ncbi.nlm.nih.gov/pubmed/30625206 http://dx.doi.org/10.1371/journal.pone.0209961 |
work_keys_str_mv | AT carrillodealbornozjorge beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums AT akerahmet beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums AT kurticemina beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums AT plazalaura beyondopinionclassificationextractingfactsopinionsandexperiencesfromhealthforums |