Cargando…
Is searching full text more effective than searching abstracts?
BACKGROUND: With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend,...
Autor principal: | |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2695361/ https://www.ncbi.nlm.nih.gov/pubmed/19192280 http://dx.doi.org/10.1186/1471-2105-10-46 |
_version_ | 1782168184875384832 |
---|---|
author | Lin, Jimmy |
author_facet | Lin, Jimmy |
author_sort | Lin, Jimmy |
collection | PubMed |
description | BACKGROUND: With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE(® )abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: bm25 and the ranking algorithm implemented in the open-source Lucene search engine. RESULTS: Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles. CONCLUSION: Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations. |
format | Text |
id | pubmed-2695361 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26953612009-06-12 Is searching full text more effective than searching abstracts? Lin, Jimmy BMC Bioinformatics Research Article BACKGROUND: With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE(® )abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: bm25 and the ranking algorithm implemented in the open-source Lucene search engine. RESULTS: Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles. CONCLUSION: Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations. BioMed Central 2009-02-03 /pmc/articles/PMC2695361/ /pubmed/19192280 http://dx.doi.org/10.1186/1471-2105-10-46 Text en Copyright © 2009 Lin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Lin, Jimmy Is searching full text more effective than searching abstracts? |
title | Is searching full text more effective than searching abstracts? |
title_full | Is searching full text more effective than searching abstracts? |
title_fullStr | Is searching full text more effective than searching abstracts? |
title_full_unstemmed | Is searching full text more effective than searching abstracts? |
title_short | Is searching full text more effective than searching abstracts? |
title_sort | is searching full text more effective than searching abstracts? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2695361/ https://www.ncbi.nlm.nih.gov/pubmed/19192280 http://dx.doi.org/10.1186/1471-2105-10-46 |
work_keys_str_mv | AT linjimmy issearchingfulltextmoreeffectivethansearchingabstracts |