Cargando…

Concept-based query expansion for retrieving gene related publications from MEDLINE

BACKGROUND: Advances in biotechnology and in high-throughput methods for gene analysis have contributed to an exponential increase in the number of scientific publications in these fields of study. While much of the data and results described in these articles are entered and annotated in the variou...

Descripción completa

Detalles Bibliográficos
Autores principales: Matos, Sérgio, Arrais, Joel P, Maia-Rodrigues, João, Oliveira, José Luis
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873540/
https://www.ncbi.nlm.nih.gov/pubmed/20426836
http://dx.doi.org/10.1186/1471-2105-11-212
_version_ 1782181364996505600
author Matos, Sérgio
Arrais, Joel P
Maia-Rodrigues, João
Oliveira, José Luis
author_facet Matos, Sérgio
Arrais, Joel P
Maia-Rodrigues, João
Oliveira, José Luis
author_sort Matos, Sérgio
collection PubMed
description BACKGROUND: Advances in biotechnology and in high-throughput methods for gene analysis have contributed to an exponential increase in the number of scientific publications in these fields of study. While much of the data and results described in these articles are entered and annotated in the various existing biomedical databases, the scientific literature is still the major source of information. There is, therefore, a growing need for text mining and information retrieval tools to help researchers find the relevant articles for their study. To tackle this, several tools have been proposed to provide alternative solutions for specific user requests. RESULTS: This paper presents QuExT, a new PubMed-based document retrieval and prioritization tool that, from a given list of genes, searches for the most relevant results from the literature. QuExT follows a concept-oriented query expansion methodology to find documents containing concepts related to the genes in the user input, such as protein and pathway names. The retrieved documents are ranked according to user-definable weights assigned to each concept class. By changing these weights, users can modify the ranking of the results in order to focus on documents dealing with a specific concept. The method's performance was evaluated using data from the 2004 TREC genomics track, producing a mean average precision of 0.425, with an average of 4.8 and 31.3 relevant documents within the top 10 and 100 retrieved abstracts, respectively. CONCLUSIONS: QuExT implements a concept-based query expansion scheme that leverages gene-related information available on a variety of biological resources. The main advantage of the system is to give the user control over the ranking of the results by means of a simple weighting scheme. Using this approach, researchers can effortlessly explore the literature regarding a group of genes and focus on the different aspects relating to these genes.
format Text
id pubmed-2873540
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28735402010-05-20 Concept-based query expansion for retrieving gene related publications from MEDLINE Matos, Sérgio Arrais, Joel P Maia-Rodrigues, João Oliveira, José Luis BMC Bioinformatics Software BACKGROUND: Advances in biotechnology and in high-throughput methods for gene analysis have contributed to an exponential increase in the number of scientific publications in these fields of study. While much of the data and results described in these articles are entered and annotated in the various existing biomedical databases, the scientific literature is still the major source of information. There is, therefore, a growing need for text mining and information retrieval tools to help researchers find the relevant articles for their study. To tackle this, several tools have been proposed to provide alternative solutions for specific user requests. RESULTS: This paper presents QuExT, a new PubMed-based document retrieval and prioritization tool that, from a given list of genes, searches for the most relevant results from the literature. QuExT follows a concept-oriented query expansion methodology to find documents containing concepts related to the genes in the user input, such as protein and pathway names. The retrieved documents are ranked according to user-definable weights assigned to each concept class. By changing these weights, users can modify the ranking of the results in order to focus on documents dealing with a specific concept. The method's performance was evaluated using data from the 2004 TREC genomics track, producing a mean average precision of 0.425, with an average of 4.8 and 31.3 relevant documents within the top 10 and 100 retrieved abstracts, respectively. CONCLUSIONS: QuExT implements a concept-based query expansion scheme that leverages gene-related information available on a variety of biological resources. The main advantage of the system is to give the user control over the ranking of the results by means of a simple weighting scheme. Using this approach, researchers can effortlessly explore the literature regarding a group of genes and focus on the different aspects relating to these genes. BioMed Central 2010-04-28 /pmc/articles/PMC2873540/ /pubmed/20426836 http://dx.doi.org/10.1186/1471-2105-11-212 Text en Copyright ©2010 Matos et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Matos, Sérgio
Arrais, Joel P
Maia-Rodrigues, João
Oliveira, José Luis
Concept-based query expansion for retrieving gene related publications from MEDLINE
title Concept-based query expansion for retrieving gene related publications from MEDLINE
title_full Concept-based query expansion for retrieving gene related publications from MEDLINE
title_fullStr Concept-based query expansion for retrieving gene related publications from MEDLINE
title_full_unstemmed Concept-based query expansion for retrieving gene related publications from MEDLINE
title_short Concept-based query expansion for retrieving gene related publications from MEDLINE
title_sort concept-based query expansion for retrieving gene related publications from medline
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873540/
https://www.ncbi.nlm.nih.gov/pubmed/20426836
http://dx.doi.org/10.1186/1471-2105-11-212
work_keys_str_mv AT matossergio conceptbasedqueryexpansionforretrievinggenerelatedpublicationsfrommedline
AT arraisjoelp conceptbasedqueryexpansionforretrievinggenerelatedpublicationsfrommedline
AT maiarodriguesjoao conceptbasedqueryexpansionforretrievinggenerelatedpublicationsfrommedline
AT oliveirajoseluis conceptbasedqueryexpansionforretrievinggenerelatedpublicationsfrommedline