Cargando…

Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Chung-Chi, Lu, Zhiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4808250/
https://www.ncbi.nlm.nih.gov/pubmed/27016698
http://dx.doi.org/10.1093/database/baw025
_version_ 1782423475937345536
author Huang, Chung-Chi
Lu, Zhiyong
author_facet Huang, Chung-Chi
Lu, Zhiyong
author_sort Huang, Chung-Chi
collection PubMed
description Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted.
format Online
Article
Text
id pubmed-4808250
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48082502016-03-29 Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation Huang, Chung-Chi Lu, Zhiyong Database (Oxford) Original Article Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. Oxford University Press 2016-03-25 /pmc/articles/PMC4808250/ /pubmed/27016698 http://dx.doi.org/10.1093/database/baw025 Text en Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
spellingShingle Original Article
Huang, Chung-Chi
Lu, Zhiyong
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
title Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
title_full Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
title_fullStr Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
title_full_unstemmed Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
title_short Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation
title_sort discovering biomedical semantic relations in pubmed queries for information retrieval and database curation
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4808250/
https://www.ncbi.nlm.nih.gov/pubmed/27016698
http://dx.doi.org/10.1093/database/baw025
work_keys_str_mv AT huangchungchi discoveringbiomedicalsemanticrelationsinpubmedqueriesforinformationretrievalanddatabasecuration
AT luzhiyong discoveringbiomedicalsemanticrelationsinpubmedqueriesforinformationretrievalanddatabasecuration