Cargando…

Open Agile text mining for bioinformatics: the PubAnnotation ecosystem

MOTIVATION: Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the assumption that they should optimize global performa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Jin-Dong, Wang, Yue, Fujiwara, Toyofumi, Okuda, Shujiro, Callahan, Tiffany J, Cohen, K Bretonnel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821251/ https://www.ncbi.nlm.nih.gov/pubmed/30937439 http://dx.doi.org/10.1093/bioinformatics/btz227

_version_	1783464111001042944
author	Kim, Jin-Dong Wang, Yue Fujiwara, Toyofumi Okuda, Shujiro Callahan, Tiffany J Cohen, K Bretonnel
author_facet	Kim, Jin-Dong Wang, Yue Fujiwara, Toyofumi Okuda, Shujiro Callahan, Tiffany J Cohen, K Bretonnel
author_sort	Kim, Jin-Dong
collection	PubMed
description	MOTIVATION: Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the assumption that they should optimize global performance metrics on representative datasets. This is a problem because most end-users are not natural language processing specialists and because biomedical researchers often care less about global metrics like F-measure or representative datasets than they do about more granular metrics such as precision and recall on their own specialized datasets. Thus, there are fundamental mismatches between the assumptions of much text mining work and the preferences of potential end-users. RESULTS: This article introduces the concept of Agile text mining, and presents the PubAnnotation ecosystem as an example implementation. The system approaches the problems from two perspectives: it allows the reformulation of text mining by biomedical researchers from the task of assembling a complete system to the task of retrieving warehoused annotations, and it makes it possible to do very targeted customization of the pre-existing system to address specific end-user requirements. Two use cases are presented: assisted curation of the GlycoEpitope database, and assessing coverage in the literature of pre-eclampsia-associated genes. AVAILABILITY AND IMPLEMENTATION: The three tools that make up the ecosystem, PubAnnotation, PubDictionaries and TextAE are publicly available as web services, and also as open source projects. The dictionaries and the annotation datasets associated with the use cases are all publicly available through PubDictionaries and PubAnnotation, respectively.
format	Online Article Text
id	pubmed-6821251
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-68212512019-11-04 Open Agile text mining for bioinformatics: the PubAnnotation ecosystem Kim, Jin-Dong Wang, Yue Fujiwara, Toyofumi Okuda, Shujiro Callahan, Tiffany J Cohen, K Bretonnel Bioinformatics Original Papers MOTIVATION: Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the assumption that they should optimize global performance metrics on representative datasets. This is a problem because most end-users are not natural language processing specialists and because biomedical researchers often care less about global metrics like F-measure or representative datasets than they do about more granular metrics such as precision and recall on their own specialized datasets. Thus, there are fundamental mismatches between the assumptions of much text mining work and the preferences of potential end-users. RESULTS: This article introduces the concept of Agile text mining, and presents the PubAnnotation ecosystem as an example implementation. The system approaches the problems from two perspectives: it allows the reformulation of text mining by biomedical researchers from the task of assembling a complete system to the task of retrieving warehoused annotations, and it makes it possible to do very targeted customization of the pre-existing system to address specific end-user requirements. Two use cases are presented: assisted curation of the GlycoEpitope database, and assessing coverage in the literature of pre-eclampsia-associated genes. AVAILABILITY AND IMPLEMENTATION: The three tools that make up the ecosystem, PubAnnotation, PubDictionaries and TextAE are publicly available as web services, and also as open source projects. The dictionaries and the annotation datasets associated with the use cases are all publicly available through PubDictionaries and PubAnnotation, respectively. Oxford University Press 2019-11-01 2019-04-01 /pmc/articles/PMC6821251/ /pubmed/30937439 http://dx.doi.org/10.1093/bioinformatics/btz227 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Kim, Jin-Dong Wang, Yue Fujiwara, Toyofumi Okuda, Shujiro Callahan, Tiffany J Cohen, K Bretonnel Open Agile text mining for bioinformatics: the PubAnnotation ecosystem
title	Open Agile text mining for bioinformatics: the PubAnnotation ecosystem
title_full	Open Agile text mining for bioinformatics: the PubAnnotation ecosystem
title_fullStr	Open Agile text mining for bioinformatics: the PubAnnotation ecosystem
title_full_unstemmed	Open Agile text mining for bioinformatics: the PubAnnotation ecosystem
title_short	Open Agile text mining for bioinformatics: the PubAnnotation ecosystem
title_sort	open agile text mining for bioinformatics: the pubannotation ecosystem
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821251/ https://www.ncbi.nlm.nih.gov/pubmed/30937439 http://dx.doi.org/10.1093/bioinformatics/btz227
work_keys_str_mv	AT kimjindong openagiletextminingforbioinformaticsthepubannotationecosystem AT wangyue openagiletextminingforbioinformaticsthepubannotationecosystem AT fujiwaratoyofumi openagiletextminingforbioinformaticsthepubannotationecosystem AT okudashujiro openagiletextminingforbioinformaticsthepubannotationecosystem AT callahantiffanyj openagiletextminingforbioinformaticsthepubannotationecosystem AT cohenkbretonnel openagiletextminingforbioinformaticsthepubannotationecosystem

Open Agile text mining for bioinformatics: the PubAnnotation ecosystem

Ejemplares similares