Cargando…

PubRunner: A light-weight framework for updating text mining results

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and...

Descripción completa

Detalles Bibliográficos
Autores principales: Anekalla, Kishore R., Courneya, J.P., Fiorini, Nicolas, Lever, Jake, Muchow, Michael, Busby, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5664974/
https://www.ncbi.nlm.nih.gov/pubmed/29152221
http://dx.doi.org/10.12688/f1000research.11389.2
_version_ 1783275095787044864
author Anekalla, Kishore R.
Courneya, J.P.
Fiorini, Nicolas
Lever, Jake
Muchow, Michael
Busby, Ben
author_facet Anekalla, Kishore R.
Courneya, J.P.
Fiorini, Nicolas
Lever, Jake
Muchow, Michael
Busby, Ben
author_sort Anekalla, Kishore R.
collection PubMed
description Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.
format Online
Article
Text
id pubmed-5664974
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-56649742017-11-17 PubRunner: A light-weight framework for updating text mining results Anekalla, Kishore R. Courneya, J.P. Fiorini, Nicolas Lever, Jake Muchow, Michael Busby, Ben F1000Res Software Tool Article Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications. F1000Research 2017-10-13 /pmc/articles/PMC5664974/ /pubmed/29152221 http://dx.doi.org/10.12688/f1000research.11389.2 Text en Copyright: © 2017 Anekalla KR et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The author(s) is/are employees of the US Government and therefore domestic copyright protection in USA does not apply to this work. The work may be protected under the copyright laws of other jurisdictions when used in those jurisdictions.
spellingShingle Software Tool Article
Anekalla, Kishore R.
Courneya, J.P.
Fiorini, Nicolas
Lever, Jake
Muchow, Michael
Busby, Ben
PubRunner: A light-weight framework for updating text mining results
title PubRunner: A light-weight framework for updating text mining results
title_full PubRunner: A light-weight framework for updating text mining results
title_fullStr PubRunner: A light-weight framework for updating text mining results
title_full_unstemmed PubRunner: A light-weight framework for updating text mining results
title_short PubRunner: A light-weight framework for updating text mining results
title_sort pubrunner: a light-weight framework for updating text mining results
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5664974/
https://www.ncbi.nlm.nih.gov/pubmed/29152221
http://dx.doi.org/10.12688/f1000research.11389.2
work_keys_str_mv AT anekallakishorer pubrunneralightweightframeworkforupdatingtextminingresults
AT courneyajp pubrunneralightweightframeworkforupdatingtextminingresults
AT fiorininicolas pubrunneralightweightframeworkforupdatingtextminingresults
AT leverjake pubrunneralightweightframeworkforupdatingtextminingresults
AT muchowmichael pubrunneralightweightframeworkforupdatingtextminingresults
AT busbyben pubrunneralightweightframeworkforupdatingtextminingresults