Cargando…

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study

BACKGROUND: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval. OBJECTIVE: The aim of this study was to automatically construct and e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Massonnaud, Clément R, Kerdelhué, Gaétan, Grosjean, Julien, Lelong, Romain, Griffon, Nicolas, Darmoni, Stefan J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303830/ https://www.ncbi.nlm.nih.gov/pubmed/32496201 http://dx.doi.org/10.2196/12799

_version_	1783548143861760000
author	Massonnaud, Clément R Kerdelhué, Gaétan Grosjean, Julien Lelong, Romain Griffon, Nicolas Darmoni, Stefan J
author_facet	Massonnaud, Clément R Kerdelhué, Gaétan Grosjean, Julien Lelong, Romain Griffon, Nicolas Darmoni, Stefan J
author_sort	Massonnaud, Clément R
collection	PubMed
description	BACKGROUND: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval. OBJECTIVE: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form “preferred term”[MH] OR “preferred term”[TIAB] OR “synonym 1”[TIAB] OR “synonym 2”[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure). METHODS: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard (“preferred term”[MH]), the number of citations retrieved by the added terms (”synonym 1“[TIAB] OR ”synonym 2“[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an “AND” operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a “preferred term,” corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric. RESULTS: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors. CONCLUSIONS: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user’s objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).
format	Online Article Text
id	pubmed-7303830
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-73038302020-06-24 Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study Massonnaud, Clément R Kerdelhué, Gaétan Grosjean, Julien Lelong, Romain Griffon, Nicolas Darmoni, Stefan J JMIR Med Inform Original Paper BACKGROUND: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval. OBJECTIVE: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form “preferred term”[MH] OR “preferred term”[TIAB] OR “synonym 1”[TIAB] OR “synonym 2”[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure). METHODS: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard (“preferred term”[MH]), the number of citations retrieved by the added terms (”synonym 1“[TIAB] OR ”synonym 2“[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an “AND” operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a “preferred term,” corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric. RESULTS: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors. CONCLUSIONS: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user’s objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure). JMIR Publications 2020-06-04 /pmc/articles/PMC7303830/ /pubmed/32496201 http://dx.doi.org/10.2196/12799 Text en ©Clément R Massonnaud, Gaétan Kerdelhué, Julien Grosjean, Romain Lelong, Nicolas Griffon, Stefan J Darmoni. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.06.2020. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Massonnaud, Clément R Kerdelhué, Gaétan Grosjean, Julien Lelong, Romain Griffon, Nicolas Darmoni, Stefan J Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
title	Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
title_full	Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
title_fullStr	Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
title_full_unstemmed	Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
title_short	Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
title_sort	identification of the best semantic expansion to query pubmed through automatic performance assessment of four search strategies on all medical subject heading descriptors: comparative study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303830/ https://www.ncbi.nlm.nih.gov/pubmed/32496201 http://dx.doi.org/10.2196/12799
work_keys_str_mv	AT massonnaudclementr identificationofthebestsemanticexpansiontoquerypubmedthroughautomaticperformanceassessmentoffoursearchstrategiesonallmedicalsubjectheadingdescriptorscomparativestudy AT kerdelhuegaetan identificationofthebestsemanticexpansiontoquerypubmedthroughautomaticperformanceassessmentoffoursearchstrategiesonallmedicalsubjectheadingdescriptorscomparativestudy AT grosjeanjulien identificationofthebestsemanticexpansiontoquerypubmedthroughautomaticperformanceassessmentoffoursearchstrategiesonallmedicalsubjectheadingdescriptorscomparativestudy AT lelongromain identificationofthebestsemanticexpansiontoquerypubmedthroughautomaticperformanceassessmentoffoursearchstrategiesonallmedicalsubjectheadingdescriptorscomparativestudy AT griffonnicolas identificationofthebestsemanticexpansiontoquerypubmedthroughautomaticperformanceassessmentoffoursearchstrategiesonallmedicalsubjectheadingdescriptorscomparativestudy AT darmonistefanj identificationofthebestsemanticexpansiontoquerypubmedthroughautomaticperformanceassessmentoffoursearchstrategiesonallmedicalsubjectheadingdescriptorscomparativestudy

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study

Ejemplares similares