Cargando…

Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

BACKGROUND: Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. OBJECTIVE: To automatically classify lay requests to an Internet medi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Himmel, Wolfgang, Reincke, Ulrich, Michelmann, Hans Wilhelm
Formato:	Texto
Lenguaje:	English
Publicado:	Gunther Eysenbach 2009
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762848/ https://www.ncbi.nlm.nih.gov/pubmed/19632978 http://dx.doi.org/10.2196/jmir.1123

_version_	1782172957037035520
author	Himmel, Wolfgang Reincke, Ulrich Michelmann, Hans Wilhelm
author_facet	Himmel, Wolfgang Reincke, Ulrich Michelmann, Hans Wilhelm
author_sort	Himmel, Wolfgang
collection	PubMed
description	BACKGROUND: Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. OBJECTIVE: To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. METHODS: We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. RESULTS: According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” CONCLUSIONS: Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback.
format	Text
id	pubmed-2762848
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Gunther Eysenbach
record_format	MEDLINE/PubMed
spelling	pubmed-27628482009-10-19 Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums Himmel, Wolfgang Reincke, Ulrich Michelmann, Hans Wilhelm J Med Internet Res Original Paper BACKGROUND: Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. OBJECTIVE: To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. METHODS: We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. RESULTS: According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” CONCLUSIONS: Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback. Gunther Eysenbach 2009-07-22 /pmc/articles/PMC2762848/ /pubmed/19632978 http://dx.doi.org/10.2196/jmir.1123 Text en © Wolfgang Himmel, Ulrich Reincke, Hans Wilhelm Michelmann. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 22.07.2009. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Himmel, Wolfgang Reincke, Ulrich Michelmann, Hans Wilhelm Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums
title	Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums
title_full	Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums
title_fullStr	Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums
title_full_unstemmed	Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums
title_short	Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums
title_sort	text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762848/ https://www.ncbi.nlm.nih.gov/pubmed/19632978 http://dx.doi.org/10.2196/jmir.1123
work_keys_str_mv	AT himmelwolfgang textminingandnaturallanguageprocessingapproachesforautomaticcategorizationoflayrequeststowebbasedexpertforums AT reinckeulrich textminingandnaturallanguageprocessingapproachesforautomaticcategorizationoflayrequeststowebbasedexpertforums AT michelmannhanswilhelm textminingandnaturallanguageprocessingapproachesforautomaticcategorizationoflayrequeststowebbasedexpertforums

Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

Ejemplares similares