Cargando…

Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet

BACKGROUND: In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task mor...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Haihong, Na, Xu, Hou, Li, Li, Jiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5497072/
https://www.ncbi.nlm.nih.gov/pubmed/28634156
http://dx.doi.org/10.2196/jmir.7156
_version_ 1783248097260863488
author Guo, Haihong
Na, Xu
Hou, Li
Li, Jiao
author_facet Guo, Haihong
Na, Xu
Hou, Li
Li, Jiao
author_sort Guo, Haihong
collection PubMed
description BACKGROUND: In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging. OBJECTIVE: This study aimed to classify health care–related questions posted by the general public (Chinese speakers) on the Internet. METHODS: A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology. RESULTS: The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F(1) score of 90.24%. CONCLUSIONS: In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users’ information needs on health care.
format Online
Article
Text
id pubmed-5497072
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-54970722017-07-11 Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet Guo, Haihong Na, Xu Hou, Li Li, Jiao J Med Internet Res Original Paper BACKGROUND: In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging. OBJECTIVE: This study aimed to classify health care–related questions posted by the general public (Chinese speakers) on the Internet. METHODS: A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology. RESULTS: The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F(1) score of 90.24%. CONCLUSIONS: In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users’ information needs on health care. JMIR Publications 2017-06-20 /pmc/articles/PMC5497072/ /pubmed/28634156 http://dx.doi.org/10.2196/jmir.7156 Text en ©Haihong Guo, Xu Na, Li Hou, Jiao Li. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.06.2017. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Guo, Haihong
Na, Xu
Hou, Li
Li, Jiao
Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet
title Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet
title_full Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet
title_fullStr Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet
title_full_unstemmed Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet
title_short Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet
title_sort classifying chinese questions related to health care posted by consumers via the internet
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5497072/
https://www.ncbi.nlm.nih.gov/pubmed/28634156
http://dx.doi.org/10.2196/jmir.7156
work_keys_str_mv AT guohaihong classifyingchinesequestionsrelatedtohealthcarepostedbyconsumersviatheinternet
AT naxu classifyingchinesequestionsrelatedtohealthcarepostedbyconsumersviatheinternet
AT houli classifyingchinesequestionsrelatedtohealthcarepostedbyconsumersviatheinternet
AT lijiao classifyingchinesequestionsrelatedtohealthcarepostedbyconsumersviatheinternet