Cargando…
Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016
Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846287/ https://www.ncbi.nlm.nih.gov/pubmed/29688372 http://dx.doi.org/10.1093/database/bax103 |
_version_ | 1783305560544772096 |
---|---|
author | Cieslewicz, Artur Dutkiewicz, Jakub Jedrzejek, Czeslaw |
author_facet | Cieslewicz, Artur Dutkiewicz, Jakub Jedrzejek, Czeslaw |
author_sort | Cieslewicz, Artur |
collection | PubMed |
description | Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data |
format | Online Article Text |
id | pubmed-5846287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58462872018-03-21 Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 Cieslewicz, Artur Dutkiewicz, Jakub Jedrzejek, Czeslaw Database (Oxford) Original Article Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data Oxford University Press 2018-03-12 /pmc/articles/PMC5846287/ /pubmed/29688372 http://dx.doi.org/10.1093/database/bax103 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Cieslewicz, Artur Dutkiewicz, Jakub Jedrzejek, Czeslaw Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 |
title | Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 |
title_full | Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 |
title_fullStr | Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 |
title_full_unstemmed | Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 |
title_short | Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016 |
title_sort | baseline and extensions approach to information retrieval of complex medical data: poznan's approach to the biocaddie 2016 |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846287/ https://www.ncbi.nlm.nih.gov/pubmed/29688372 http://dx.doi.org/10.1093/database/bax103 |
work_keys_str_mv | AT cieslewiczartur baselineandextensionsapproachtoinformationretrievalofcomplexmedicaldatapoznansapproachtothebiocaddie2016 AT dutkiewiczjakub baselineandextensionsapproachtoinformationretrievalofcomplexmedicaldatapoznansapproachtothebiocaddie2016 AT jedrzejekczeslaw baselineandextensionsapproachtoinformationretrievalofcomplexmedicaldatapoznansapproachtothebiocaddie2016 |