Cargando…

The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research

MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs...

Descripción completa

Detalles Bibliográficos
Autores principales: Rae, Alastair R., Mork, James G., Demner‐Fushman, Dina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9937663/
https://www.ncbi.nlm.nih.gov/pubmed/36819642
http://dx.doi.org/10.1002/asi.24722
_version_ 1784890472477491200
author Rae, Alastair R.
Mork, James G.
Demner‐Fushman, Dina
author_facet Rae, Alastair R.
Mork, James G.
Demner‐Fushman, Dina
author_sort Rae, Alastair R.
collection PubMed
description MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain‐adaptive pre‐training for the closely related task of reviewer assignment.
format Online
Article
Text
id pubmed-9937663
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-99376632023-04-14 The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research Rae, Alastair R. Mork, James G. Demner‐Fushman, Dina J Assoc Inf Sci Technol Research Articles MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain‐adaptive pre‐training for the closely related task of reviewer assignment. John Wiley & Sons, Inc. 2022-11-08 2023-02 /pmc/articles/PMC9937663/ /pubmed/36819642 http://dx.doi.org/10.1002/asi.24722 Text en Published 2022. This article is a U.S. Government work and is in the public domain in the USA. Journal of the Association for Information Science and Technology published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Research Articles
Rae, Alastair R.
Mork, James G.
Demner‐Fushman, Dina
The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
title The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
title_full The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
title_fullStr The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
title_full_unstemmed The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
title_short The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
title_sort national library of medicine indexer assignment dataset: a new large‐scale dataset for reviewer assignment research
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9937663/
https://www.ncbi.nlm.nih.gov/pubmed/36819642
http://dx.doi.org/10.1002/asi.24722
work_keys_str_mv AT raealastairr thenationallibraryofmedicineindexerassignmentdatasetanewlargescaledatasetforreviewerassignmentresearch
AT morkjamesg thenationallibraryofmedicineindexerassignmentdatasetanewlargescaledatasetforreviewerassignmentresearch
AT demnerfushmandina thenationallibraryofmedicineindexerassignmentdatasetanewlargescaledatasetforreviewerassignmentresearch
AT raealastairr nationallibraryofmedicineindexerassignmentdatasetanewlargescaledatasetforreviewerassignmentresearch
AT morkjamesg nationallibraryofmedicineindexerassignmentdatasetanewlargescaledatasetforreviewerassignmentresearch
AT demnerfushmandina nationallibraryofmedicineindexerassignmentdatasetanewlargescaledatasetforreviewerassignmentresearch