Cargando…

A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cohen, Trevor, Roberts, Kirk, Gururaj, Anupama E., Chen, Xiaoling, Pournejati, Saeid, Alter, George, Hersh, William R., Demner-Fushman, Dina, Ohno-Machado, Lucila, Xu, Hua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737202/ https://www.ncbi.nlm.nih.gov/pubmed/29220453 http://dx.doi.org/10.1093/database/bax061

_version_	1783287483545419776
author	Cohen, Trevor Roberts, Kirk Gururaj, Anupama E. Chen, Xiaoling Pournejati, Saeid Alter, George Hersh, William R. Demner-Fushman, Dina Ohno-Machado, Lucila Xu, Hua
author_facet	Cohen, Trevor Roberts, Kirk Gururaj, Anupama E. Chen, Xiaoling Pournejati, Saeid Alter, George Hersh, William R. Demner-Fushman, Dina Ohno-Machado, Lucila Xu, Hua
author_sort	Cohen, Trevor
collection	PubMed
description	The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data
format	Online Article Text
id	pubmed-5737202
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-57372022018-01-08 A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge Cohen, Trevor Roberts, Kirk Gururaj, Anupama E. Chen, Xiaoling Pournejati, Saeid Alter, George Hersh, William R. Demner-Fushman, Dina Ohno-Machado, Lucila Xu, Hua Database (Oxford) Original Article The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data Oxford University Press 2017-08-18 /pmc/articles/PMC5737202/ /pubmed/29220453 http://dx.doi.org/10.1093/database/bax061 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Cohen, Trevor Roberts, Kirk Gururaj, Anupama E. Chen, Xiaoling Pournejati, Saeid Alter, George Hersh, William R. Demner-Fushman, Dina Ohno-Machado, Lucila Xu, Hua A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
title	A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
title_full	A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
title_fullStr	A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
title_full_unstemmed	A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
title_short	A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
title_sort	publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 biocaddie dataset retrieval challenge
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737202/ https://www.ncbi.nlm.nih.gov/pubmed/29220453 http://dx.doi.org/10.1093/database/bax061
work_keys_str_mv	AT cohentrevor apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT robertskirk apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT gururajanupamae apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT chenxiaoling apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT pournejatisaeid apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT altergeorge apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT hershwilliamr apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT demnerfushmandina apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT ohnomachadolucila apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT xuhua apubliclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT cohentrevor publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT robertskirk publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT gururajanupamae publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT chenxiaoling publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT pournejatisaeid publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT altergeorge publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT hershwilliamr publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT demnerfushmandina publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT ohnomachadolucila publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge AT xuhua publiclyavailablebenchmarkforbiomedicaldatasetretrievalthereferencestandardforthe2016biocaddiedatasetretrievalchallenge

A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

Ejemplares similares