Cargando…

Click-words: learning to predict document keywords from a user perspective

Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose...

Descripción completa

Detalles Bibliográficos
Autores principales: Islamaj Doğan, Rezarta, Lu, Zhiyong
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958742/
https://www.ncbi.nlm.nih.gov/pubmed/20810602
http://dx.doi.org/10.1093/bioinformatics/btq459
_version_ 1782188369284956160
author Islamaj Doğan, Rezarta
Lu, Zhiyong
author_facet Islamaj Doğan, Rezarta
Lu, Zhiyong
author_sort Islamaj Doğan, Rezarta
collection PubMed
description Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords. Results: We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency–inverse document frequency (TF–IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF–IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks. Contact: luzh@ncbi.nlm.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2958742
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29587422010-10-22 Click-words: learning to predict document keywords from a user perspective Islamaj Doğan, Rezarta Lu, Zhiyong Bioinformatics Original Paper Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords. Results: We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency–inverse document frequency (TF–IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF–IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks. Contact: luzh@ncbi.nlm.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-11-01 2010-09-01 /pmc/articles/PMC2958742/ /pubmed/20810602 http://dx.doi.org/10.1093/bioinformatics/btq459 Text en http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Islamaj Doğan, Rezarta
Lu, Zhiyong
Click-words: learning to predict document keywords from a user perspective
title Click-words: learning to predict document keywords from a user perspective
title_full Click-words: learning to predict document keywords from a user perspective
title_fullStr Click-words: learning to predict document keywords from a user perspective
title_full_unstemmed Click-words: learning to predict document keywords from a user perspective
title_short Click-words: learning to predict document keywords from a user perspective
title_sort click-words: learning to predict document keywords from a user perspective
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958742/
https://www.ncbi.nlm.nih.gov/pubmed/20810602
http://dx.doi.org/10.1093/bioinformatics/btq459
work_keys_str_mv AT islamajdoganrezarta clickwordslearningtopredictdocumentkeywordsfromauserperspective
AT luzhiyong clickwordslearningtopredictdocumentkeywordsfromauserperspective