Cargando…

A robust approach to optimizing multi-source information for enhancing genomics retrieval performance

BACKGROUND: The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Qinmin, Huang, Jimmy Xiangji, Miao, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226256/
https://www.ncbi.nlm.nih.gov/pubmed/21989123
http://dx.doi.org/10.1186/1471-2105-12-S5-S6
_version_ 1782217587574177792
author Hu, Qinmin
Huang, Jimmy Xiangji
Miao, Jun
author_facet Hu, Qinmin
Huang, Jimmy Xiangji
Miao, Jun
author_sort Hu, Qinmin
collection PubMed
description BACKGROUND: The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre-defined searching techniques/functions such that various ranking strategies are designed depending on different sources. In this paper, we propose a robust approach to optimizing multi-source information for improving genomics retrieval performance. RESULTS: In the proposed approach, we first consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Then, given selected baselines from multiple sources, we investigate three modified fusion methods in the proposed approach, reciprocal, CombMNZ and CombSUM, to re-rank the candidates as the outputs for evaluation. Our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach for obtaining better performance. Furthermore, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the passage2-level MAP and the aspect-level MAP. CONCLUSIONS: From the extensive experiments on two TREC genomics data sets, we draw the following conclusions. For the three fusion methods proposed in the robust approach, the reciprocal method outperforms the CombMNZ and CombSUM methods obviously, and CombSUM works well on the passage2-level when compared with CombMNZ. Based on the multiple sources of DFR, BM25 and language model, we can observe that the alliance of giants achieves the best result. Meanwhile, under the same combination, the better the baseline performance is, the more contribution the baseline provides. These conclusions are very useful to direct the fusion work in the field of biomedical information retrieval.
format Online
Article
Text
id pubmed-3226256
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32262562011-11-30 A robust approach to optimizing multi-source information for enhancing genomics retrieval performance Hu, Qinmin Huang, Jimmy Xiangji Miao, Jun BMC Bioinformatics Proceedings BACKGROUND: The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre-defined searching techniques/functions such that various ranking strategies are designed depending on different sources. In this paper, we propose a robust approach to optimizing multi-source information for improving genomics retrieval performance. RESULTS: In the proposed approach, we first consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Then, given selected baselines from multiple sources, we investigate three modified fusion methods in the proposed approach, reciprocal, CombMNZ and CombSUM, to re-rank the candidates as the outputs for evaluation. Our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach for obtaining better performance. Furthermore, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the passage2-level MAP and the aspect-level MAP. CONCLUSIONS: From the extensive experiments on two TREC genomics data sets, we draw the following conclusions. For the three fusion methods proposed in the robust approach, the reciprocal method outperforms the CombMNZ and CombSUM methods obviously, and CombSUM works well on the passage2-level when compared with CombMNZ. Based on the multiple sources of DFR, BM25 and language model, we can observe that the alliance of giants achieves the best result. Meanwhile, under the same combination, the better the baseline performance is, the more contribution the baseline provides. These conclusions are very useful to direct the fusion work in the field of biomedical information retrieval. BioMed Central 2011-07-27 /pmc/articles/PMC3226256/ /pubmed/21989123 http://dx.doi.org/10.1186/1471-2105-12-S5-S6 Text en Copyright ©2011 Hu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Hu, Qinmin
Huang, Jimmy Xiangji
Miao, Jun
A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
title A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
title_full A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
title_fullStr A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
title_full_unstemmed A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
title_short A robust approach to optimizing multi-source information for enhancing genomics retrieval performance
title_sort robust approach to optimizing multi-source information for enhancing genomics retrieval performance
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226256/
https://www.ncbi.nlm.nih.gov/pubmed/21989123
http://dx.doi.org/10.1186/1471-2105-12-S5-S6
work_keys_str_mv AT huqinmin arobustapproachtooptimizingmultisourceinformationforenhancinggenomicsretrievalperformance
AT huangjimmyxiangji arobustapproachtooptimizingmultisourceinformationforenhancinggenomicsretrievalperformance
AT miaojun arobustapproachtooptimizingmultisourceinformationforenhancinggenomicsretrievalperformance
AT huqinmin robustapproachtooptimizingmultisourceinformationforenhancinggenomicsretrievalperformance
AT huangjimmyxiangji robustapproachtooptimizingmultisourceinformationforenhancinggenomicsretrievalperformance
AT miaojun robustapproachtooptimizingmultisourceinformationforenhancinggenomicsretrievalperformance