Cargando…

Automatic topic identification of health-related messages in online health community using text classification

To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patie...

Descripción completa

Detalles Bibliográficos
Autor principal: Lu, Yingjie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3736074/
https://www.ncbi.nlm.nih.gov/pubmed/23961389
http://dx.doi.org/10.1186/2193-1801-2-309
_version_ 1782279736379047936
author Lu, Yingjie
author_facet Lu, Yingjie
author_sort Lu, Yingjie
collection PubMed
description To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.
format Online
Article
Text
id pubmed-3736074
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-37360742013-08-07 Automatic topic identification of health-related messages in online health community using text classification Lu, Yingjie Springerplus Research To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently. Springer International Publishing 2013-07-10 /pmc/articles/PMC3736074/ /pubmed/23961389 http://dx.doi.org/10.1186/2193-1801-2-309 Text en © Lu; licensee Springer. 2013 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Lu, Yingjie
Automatic topic identification of health-related messages in online health community using text classification
title Automatic topic identification of health-related messages in online health community using text classification
title_full Automatic topic identification of health-related messages in online health community using text classification
title_fullStr Automatic topic identification of health-related messages in online health community using text classification
title_full_unstemmed Automatic topic identification of health-related messages in online health community using text classification
title_short Automatic topic identification of health-related messages in online health community using text classification
title_sort automatic topic identification of health-related messages in online health community using text classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3736074/
https://www.ncbi.nlm.nih.gov/pubmed/23961389
http://dx.doi.org/10.1186/2193-1801-2-309
work_keys_str_mv AT luyingjie automatictopicidentificationofhealthrelatedmessagesinonlinehealthcommunityusingtextclassification