Cargando…

A Field Sensor: computing the composition and intent of PubMed queries

PubMed(®) is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present...

Descripción completa

Detalles Bibliográficos
Autores principales: Yeganova, Lana, Kim, Won, Comeau, Donald C, Wilbur, W John, Lu, Zhiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044290/
https://www.ncbi.nlm.nih.gov/pubmed/30010750
http://dx.doi.org/10.1093/database/bay052
_version_ 1783339455370756096
author Yeganova, Lana
Kim, Won
Comeau, Donald C
Wilbur, W John
Lu, Zhiyong
author_facet Yeganova, Lana
Kim, Won
Comeau, Donald C
Wilbur, W John
Lu, Zhiyong
author_sort Yeganova, Lana
collection PubMed
description PubMed(®) is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present Field Sensor, a domain-specific tool for understanding the composition and predicting the user intent of PubMed queries. Given a query, the Field Sensor infers a field for each token or sequence of tokens in a query in multi-step process that includes syntactic chunking, rule-based tagging and probabilistic field prediction. In this work, the fields of interest are those associated with (meta-)data elements of each PubMed record such as article title, abstract, author name(s), journal title, volume, issue, page and date. We evaluate the accuracy of our algorithm on a human-annotated corpus of 10 000 PubMed queries, as well as a new machine-annotated set of 103 000 PubMed queries. The Field Sensor achieves an accuracy of 93 and 91% on the two corresponding corpora and finds that nearly half of all searches are navigational (e.g. author searches, article title searches etc.) and half are informational (e.g. topical searches). The Field Sensor has been integrated into PubMed since June 2017 to detect informational queries for which results sorted by relevance can be suggested as an alternative to those sorted by the default date sort. In addition, the composition of PubMed queries as computed by the Field Sensor proves to be essential for understanding how users query PubMed.
format Online
Article
Text
id pubmed-6044290
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60442902018-07-19 A Field Sensor: computing the composition and intent of PubMed queries Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John Lu, Zhiyong Database (Oxford) Original Article PubMed(®) is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present Field Sensor, a domain-specific tool for understanding the composition and predicting the user intent of PubMed queries. Given a query, the Field Sensor infers a field for each token or sequence of tokens in a query in multi-step process that includes syntactic chunking, rule-based tagging and probabilistic field prediction. In this work, the fields of interest are those associated with (meta-)data elements of each PubMed record such as article title, abstract, author name(s), journal title, volume, issue, page and date. We evaluate the accuracy of our algorithm on a human-annotated corpus of 10 000 PubMed queries, as well as a new machine-annotated set of 103 000 PubMed queries. The Field Sensor achieves an accuracy of 93 and 91% on the two corresponding corpora and finds that nearly half of all searches are navigational (e.g. author searches, article title searches etc.) and half are informational (e.g. topical searches). The Field Sensor has been integrated into PubMed since June 2017 to detect informational queries for which results sorted by relevance can be suggested as an alternative to those sorted by the default date sort. In addition, the composition of PubMed queries as computed by the Field Sensor proves to be essential for understanding how users query PubMed. Oxford University Press 2018-07-12 /pmc/articles/PMC6044290/ /pubmed/30010750 http://dx.doi.org/10.1093/database/bay052 Text en Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US. https://academic.oup.com/journals/pages/about_us/legal/notices This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
spellingShingle Original Article
Yeganova, Lana
Kim, Won
Comeau, Donald C
Wilbur, W John
Lu, Zhiyong
A Field Sensor: computing the composition and intent of PubMed queries
title A Field Sensor: computing the composition and intent of PubMed queries
title_full A Field Sensor: computing the composition and intent of PubMed queries
title_fullStr A Field Sensor: computing the composition and intent of PubMed queries
title_full_unstemmed A Field Sensor: computing the composition and intent of PubMed queries
title_short A Field Sensor: computing the composition and intent of PubMed queries
title_sort field sensor: computing the composition and intent of pubmed queries
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044290/
https://www.ncbi.nlm.nih.gov/pubmed/30010750
http://dx.doi.org/10.1093/database/bay052
work_keys_str_mv AT yeganovalana afieldsensorcomputingthecompositionandintentofpubmedqueries
AT kimwon afieldsensorcomputingthecompositionandintentofpubmedqueries
AT comeaudonaldc afieldsensorcomputingthecompositionandintentofpubmedqueries
AT wilburwjohn afieldsensorcomputingthecompositionandintentofpubmedqueries
AT luzhiyong afieldsensorcomputingthecompositionandintentofpubmedqueries
AT yeganovalana fieldsensorcomputingthecompositionandintentofpubmedqueries
AT kimwon fieldsensorcomputingthecompositionandintentofpubmedqueries
AT comeaudonaldc fieldsensorcomputingthecompositionandintentofpubmedqueries
AT wilburwjohn fieldsensorcomputingthecompositionandintentofpubmedqueries
AT luzhiyong fieldsensorcomputingthecompositionandintentofpubmedqueries