Cargando…

Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016

Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present...

Descripción completa

Detalles Bibliográficos
Autores principales: Cieslewicz, Artur, Dutkiewicz, Jakub, Jedrzejek, Czeslaw
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846287/
https://www.ncbi.nlm.nih.gov/pubmed/29688372
http://dx.doi.org/10.1093/database/bax103
_version_ 1783305560544772096
author Cieslewicz, Artur
Dutkiewicz, Jakub
Jedrzejek, Czeslaw
author_facet Cieslewicz, Artur
Dutkiewicz, Jakub
Jedrzejek, Czeslaw
author_sort Cieslewicz, Artur
collection PubMed
description Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data
format Online
Article
Text
id pubmed-5846287
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58462872018-03-21 Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 Cieslewicz, Artur Dutkiewicz, Jakub Jedrzejek, Czeslaw Database (Oxford) Original Article Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data Oxford University Press 2018-03-12 /pmc/articles/PMC5846287/ /pubmed/29688372 http://dx.doi.org/10.1093/database/bax103 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Cieslewicz, Artur
Dutkiewicz, Jakub
Jedrzejek, Czeslaw
Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
title Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
title_full Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
title_fullStr Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
title_full_unstemmed Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
title_short Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
title_sort baseline and extensions approach to information retrieval of complex medical data: poznan's approach to the biocaddie 2016
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846287/
https://www.ncbi.nlm.nih.gov/pubmed/29688372
http://dx.doi.org/10.1093/database/bax103
work_keys_str_mv AT cieslewiczartur baselineandextensionsapproachtoinformationretrievalofcomplexmedicaldatapoznansapproachtothebiocaddie2016
AT dutkiewiczjakub baselineandextensionsapproachtoinformationretrievalofcomplexmedicaldatapoznansapproachtothebiocaddie2016
AT jedrzejekczeslaw baselineandextensionsapproachtoinformationretrievalofcomplexmedicaldatapoznansapproachtothebiocaddie2016