Cargando…
A Field Sensor: computing the composition and intent of PubMed queries
PubMed(®) is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044290/ https://www.ncbi.nlm.nih.gov/pubmed/30010750 http://dx.doi.org/10.1093/database/bay052 |
_version_ | 1783339455370756096 |
---|---|
author | Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John Lu, Zhiyong |
author_facet | Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John Lu, Zhiyong |
author_sort | Yeganova, Lana |
collection | PubMed |
description | PubMed(®) is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present Field Sensor, a domain-specific tool for understanding the composition and predicting the user intent of PubMed queries. Given a query, the Field Sensor infers a field for each token or sequence of tokens in a query in multi-step process that includes syntactic chunking, rule-based tagging and probabilistic field prediction. In this work, the fields of interest are those associated with (meta-)data elements of each PubMed record such as article title, abstract, author name(s), journal title, volume, issue, page and date. We evaluate the accuracy of our algorithm on a human-annotated corpus of 10 000 PubMed queries, as well as a new machine-annotated set of 103 000 PubMed queries. The Field Sensor achieves an accuracy of 93 and 91% on the two corresponding corpora and finds that nearly half of all searches are navigational (e.g. author searches, article title searches etc.) and half are informational (e.g. topical searches). The Field Sensor has been integrated into PubMed since June 2017 to detect informational queries for which results sorted by relevance can be suggested as an alternative to those sorted by the default date sort. In addition, the composition of PubMed queries as computed by the Field Sensor proves to be essential for understanding how users query PubMed. |
format | Online Article Text |
id | pubmed-6044290 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60442902018-07-19 A Field Sensor: computing the composition and intent of PubMed queries Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John Lu, Zhiyong Database (Oxford) Original Article PubMed(®) is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present Field Sensor, a domain-specific tool for understanding the composition and predicting the user intent of PubMed queries. Given a query, the Field Sensor infers a field for each token or sequence of tokens in a query in multi-step process that includes syntactic chunking, rule-based tagging and probabilistic field prediction. In this work, the fields of interest are those associated with (meta-)data elements of each PubMed record such as article title, abstract, author name(s), journal title, volume, issue, page and date. We evaluate the accuracy of our algorithm on a human-annotated corpus of 10 000 PubMed queries, as well as a new machine-annotated set of 103 000 PubMed queries. The Field Sensor achieves an accuracy of 93 and 91% on the two corresponding corpora and finds that nearly half of all searches are navigational (e.g. author searches, article title searches etc.) and half are informational (e.g. topical searches). The Field Sensor has been integrated into PubMed since June 2017 to detect informational queries for which results sorted by relevance can be suggested as an alternative to those sorted by the default date sort. In addition, the composition of PubMed queries as computed by the Field Sensor proves to be essential for understanding how users query PubMed. Oxford University Press 2018-07-12 /pmc/articles/PMC6044290/ /pubmed/30010750 http://dx.doi.org/10.1093/database/bay052 Text en Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US. https://academic.oup.com/journals/pages/about_us/legal/notices This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) |
spellingShingle | Original Article Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John Lu, Zhiyong A Field Sensor: computing the composition and intent of PubMed queries |
title | A Field Sensor: computing the composition and intent of PubMed queries |
title_full | A Field Sensor: computing the composition and intent of PubMed queries |
title_fullStr | A Field Sensor: computing the composition and intent of PubMed queries |
title_full_unstemmed | A Field Sensor: computing the composition and intent of PubMed queries |
title_short | A Field Sensor: computing the composition and intent of PubMed queries |
title_sort | field sensor: computing the composition and intent of pubmed queries |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044290/ https://www.ncbi.nlm.nih.gov/pubmed/30010750 http://dx.doi.org/10.1093/database/bay052 |
work_keys_str_mv | AT yeganovalana afieldsensorcomputingthecompositionandintentofpubmedqueries AT kimwon afieldsensorcomputingthecompositionandintentofpubmedqueries AT comeaudonaldc afieldsensorcomputingthecompositionandintentofpubmedqueries AT wilburwjohn afieldsensorcomputingthecompositionandintentofpubmedqueries AT luzhiyong afieldsensorcomputingthecompositionandintentofpubmedqueries AT yeganovalana fieldsensorcomputingthecompositionandintentofpubmedqueries AT kimwon fieldsensorcomputingthecompositionandintentofpubmedqueries AT comeaudonaldc fieldsensorcomputingthecompositionandintentofpubmedqueries AT wilburwjohn fieldsensorcomputingthecompositionandintentofpubmedqueries AT luzhiyong fieldsensorcomputingthecompositionandintentofpubmedqueries |