Cargando…

Modeling and mining term association for improving biomedical information retrieval performance

BACKGROUND: The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more d...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Qinmin, Huang, Jimmy Xiangji, Hu, Xiaohua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372456/
https://www.ncbi.nlm.nih.gov/pubmed/22901087
http://dx.doi.org/10.1186/1471-2105-13-S9-S2
_version_ 1782235349338030080
author Hu, Qinmin
Huang, Jimmy Xiangji
Hu, Xiaohua
author_facet Hu, Qinmin
Huang, Jimmy Xiangji
Hu, Xiaohua
author_sort Hu, Qinmin
collection PubMed
description BACKGROUND: The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. RESULTS: We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. CONCLUSIONS: First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.
format Online
Article
Text
id pubmed-3372456
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33724562012-06-13 Modeling and mining term association for improving biomedical information retrieval performance Hu, Qinmin Huang, Jimmy Xiangji Hu, Xiaohua BMC Bioinformatics Proceedings BACKGROUND: The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. RESULTS: We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. CONCLUSIONS: First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages. BioMed Central 2012-06-11 /pmc/articles/PMC3372456/ /pubmed/22901087 http://dx.doi.org/10.1186/1471-2105-13-S9-S2 Text en Copyright ©2012 Hu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Hu, Qinmin
Huang, Jimmy Xiangji
Hu, Xiaohua
Modeling and mining term association for improving biomedical information retrieval performance
title Modeling and mining term association for improving biomedical information retrieval performance
title_full Modeling and mining term association for improving biomedical information retrieval performance
title_fullStr Modeling and mining term association for improving biomedical information retrieval performance
title_full_unstemmed Modeling and mining term association for improving biomedical information retrieval performance
title_short Modeling and mining term association for improving biomedical information retrieval performance
title_sort modeling and mining term association for improving biomedical information retrieval performance
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372456/
https://www.ncbi.nlm.nih.gov/pubmed/22901087
http://dx.doi.org/10.1186/1471-2105-13-S9-S2
work_keys_str_mv AT huqinmin modelingandminingtermassociationforimprovingbiomedicalinformationretrievalperformance
AT huangjimmyxiangji modelingandminingtermassociationforimprovingbiomedicalinformationretrievalperformance
AT huxiaohua modelingandminingtermassociationforimprovingbiomedicalinformationretrievalperformance