Cargando…

A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism

BACKGROUND: Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health...

Descripción completa

Detalles Bibliográficos
Autores principales: Wongchaisuwat, Papis, Klabjan, Diego, Jonnalagadda, Siddhartha Reddy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987493/
https://www.ncbi.nlm.nih.gov/pubmed/27485666
http://dx.doi.org/10.2196/medinform.5490
_version_ 1782448315917402112
author Wongchaisuwat, Papis
Klabjan, Diego
Jonnalagadda, Siddhartha Reddy
author_facet Wongchaisuwat, Papis
Klabjan, Diego
Jonnalagadda, Siddhartha Reddy
author_sort Wongchaisuwat, Papis
collection PubMed
description BACKGROUND: Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. OBJECTIVE: In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. METHODS: Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. RESULTS: On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system–based (health related) features used in the model enhance the algorithm’s performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. CONCLUSIONS: Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites.
format Online
Article
Text
id pubmed-4987493
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-49874932016-08-29 A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism Wongchaisuwat, Papis Klabjan, Diego Jonnalagadda, Siddhartha Reddy JMIR Med Inform Original Paper BACKGROUND: Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. OBJECTIVE: In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. METHODS: Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. RESULTS: On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system–based (health related) features used in the model enhance the algorithm’s performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. CONCLUSIONS: Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites. JMIR Publications 2016-08-02 /pmc/articles/PMC4987493/ /pubmed/27485666 http://dx.doi.org/10.2196/medinform.5490 Text en ©Papis Wongchaisuwat, Diego Klabjan, Siddhartha Reddy Jonnalagadda. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 02.08.2016. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Wongchaisuwat, Papis
Klabjan, Diego
Jonnalagadda, Siddhartha Reddy
A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
title A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
title_full A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
title_fullStr A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
title_full_unstemmed A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
title_short A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
title_sort semi-supervised learning approach to enhance health care community–based question answering: a case study in alcoholism
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987493/
https://www.ncbi.nlm.nih.gov/pubmed/27485666
http://dx.doi.org/10.2196/medinform.5490
work_keys_str_mv AT wongchaisuwatpapis asemisupervisedlearningapproachtoenhancehealthcarecommunitybasedquestionansweringacasestudyinalcoholism
AT klabjandiego asemisupervisedlearningapproachtoenhancehealthcarecommunitybasedquestionansweringacasestudyinalcoholism
AT jonnalagaddasiddharthareddy asemisupervisedlearningapproachtoenhancehealthcarecommunitybasedquestionansweringacasestudyinalcoholism
AT wongchaisuwatpapis semisupervisedlearningapproachtoenhancehealthcarecommunitybasedquestionansweringacasestudyinalcoholism
AT klabjandiego semisupervisedlearningapproachtoenhancehealthcarecommunitybasedquestionansweringacasestudyinalcoholism
AT jonnalagaddasiddharthareddy semisupervisedlearningapproachtoenhancehealthcarecommunitybasedquestionansweringacasestudyinalcoholism