Cargando…

Matching health information seekers' queries to medical terms

BACKGROUND: The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniqu...

Descripción completa

Detalles Bibliográficos
Autores principales: Soualmia, Lina F, Prieur-Gaston, Elise, Moalla, Zied, Lecroq, Thierry, Darmoni, Stéfan J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439674/
https://www.ncbi.nlm.nih.gov/pubmed/23095521
http://dx.doi.org/10.1186/1471-2105-13-S14-S11
_version_ 1782243042764259328
author Soualmia, Lina F
Prieur-Gaston, Elise
Moalla, Zied
Lecroq, Thierry
Darmoni, Stéfan J
author_facet Soualmia, Lina F
Prieur-Gaston, Elise
Moalla, Zied
Lecroq, Thierry
Darmoni, Stéfan J
author_sort Soualmia, Lina F
collection PubMed
description BACKGROUND: The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. METHODS: In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. RESULTS: According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. CONCLUSIONS: Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records.
format Online
Article
Text
id pubmed-3439674
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34396742012-09-17 Matching health information seekers' queries to medical terms Soualmia, Lina F Prieur-Gaston, Elise Moalla, Zied Lecroq, Thierry Darmoni, Stéfan J BMC Bioinformatics Research BACKGROUND: The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. METHODS: In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. RESULTS: According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. CONCLUSIONS: Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records. BioMed Central 2012-09-07 /pmc/articles/PMC3439674/ /pubmed/23095521 http://dx.doi.org/10.1186/1471-2105-13-S14-S11 Text en Copyright ©2012 Soualmia et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Soualmia, Lina F
Prieur-Gaston, Elise
Moalla, Zied
Lecroq, Thierry
Darmoni, Stéfan J
Matching health information seekers' queries to medical terms
title Matching health information seekers' queries to medical terms
title_full Matching health information seekers' queries to medical terms
title_fullStr Matching health information seekers' queries to medical terms
title_full_unstemmed Matching health information seekers' queries to medical terms
title_short Matching health information seekers' queries to medical terms
title_sort matching health information seekers' queries to medical terms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439674/
https://www.ncbi.nlm.nih.gov/pubmed/23095521
http://dx.doi.org/10.1186/1471-2105-13-S14-S11
work_keys_str_mv AT soualmialinaf matchinghealthinformationseekersqueriestomedicalterms
AT prieurgastonelise matchinghealthinformationseekersqueriestomedicalterms
AT moallazied matchinghealthinformationseekersqueriestomedicalterms
AT lecroqthierry matchinghealthinformationseekersqueriestomedicalterms
AT darmonistefanj matchinghealthinformationseekersqueriestomedicalterms