Cargando…

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications

Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified enti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Döring, Kersten, Grüning, Björn A., Telukunta, Kiran K., Thomas, Philippe, Günther, Stefan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5051953/ https://www.ncbi.nlm.nih.gov/pubmed/27706202 http://dx.doi.org/10.1371/journal.pone.0163794

_version_	1782458169710084096
author	Döring, Kersten Grüning, Björn A. Telukunta, Kiran K. Thomas, Philippe Günther, Stefan
author_facet	Döring, Kersten Grüning, Björn A. Telukunta, Kiran K. Thomas, Philippe Günther, Stefan
author_sort	Döring, Kersten
collection	PubMed
description	Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user’s system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
format	Online Article Text
id	pubmed-5051953
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-50519532016-10-27 PubMedPortable: A Framework for Supporting the Development of Text Mining Applications Döring, Kersten Grüning, Björn A. Telukunta, Kiran K. Thomas, Philippe Günther, Stefan PLoS One Research Article Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user’s system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects. Public Library of Science 2016-10-05 /pmc/articles/PMC5051953/ /pubmed/27706202 http://dx.doi.org/10.1371/journal.pone.0163794 Text en © 2016 Döring et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Döring, Kersten Grüning, Björn A. Telukunta, Kiran K. Thomas, Philippe Günther, Stefan PubMedPortable: A Framework for Supporting the Development of Text Mining Applications
title	PubMedPortable: A Framework for Supporting the Development of Text Mining Applications
title_full	PubMedPortable: A Framework for Supporting the Development of Text Mining Applications
title_fullStr	PubMedPortable: A Framework for Supporting the Development of Text Mining Applications
title_full_unstemmed	PubMedPortable: A Framework for Supporting the Development of Text Mining Applications
title_short	PubMedPortable: A Framework for Supporting the Development of Text Mining Applications
title_sort	pubmedportable: a framework for supporting the development of text mining applications
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5051953/ https://www.ncbi.nlm.nih.gov/pubmed/27706202 http://dx.doi.org/10.1371/journal.pone.0163794
work_keys_str_mv	AT doringkersten pubmedportableaframeworkforsupportingthedevelopmentoftextminingapplications AT gruningbjorna pubmedportableaframeworkforsupportingthedevelopmentoftextminingapplications AT telukuntakirank pubmedportableaframeworkforsupportingthedevelopmentoftextminingapplications AT thomasphilippe pubmedportableaframeworkforsupportingthedevelopmentoftextminingapplications AT guntherstefan pubmedportableaframeworkforsupportingthedevelopmentoftextminingapplications

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications

Ejemplares similares