Cargando…

Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge

The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wei, Wei, Ji, Zhanglong, He, Yupeng, Zhang, Kai, Ha, Yuanchi, Li, Qi, Ohno-Machado, Lucila
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861401/ https://www.ncbi.nlm.nih.gov/pubmed/29688374 http://dx.doi.org/10.1093/database/bay017

Descripción
Sumario:	The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline

Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge

Ejemplares similares