Cargando…

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wiegers, Thomas C., Davis, Allan Peter, Mattingly, Carolyn J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207221/ https://www.ncbi.nlm.nih.gov/pubmed/24919658 http://dx.doi.org/10.1093/database/bau050

_version_	1782340937381314560
author	Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J.
author_facet	Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J.
author_sort	Wiegers, Thomas C.
collection	PubMed
description	The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/
format	Online Article Text
id	pubmed-4207221
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-42072212014-10-28 Web services-based text-mining demonstrates broad impacts for interoperability and process simplification Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Database (Oxford) Original Article The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ Oxford University Press 2014-06-10 /pmc/articles/PMC4207221/ /pubmed/24919658 http://dx.doi.org/10.1093/database/bau050 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title	Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_full	Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_fullStr	Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_full_unstemmed	Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_short	Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_sort	web services-based text-mining demonstrates broad impacts for interoperability and process simplification
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207221/ https://www.ncbi.nlm.nih.gov/pubmed/24919658 http://dx.doi.org/10.1093/database/bau050
work_keys_str_mv	AT wiegersthomasc webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification AT davisallanpeter webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification AT mattinglycarolynj webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification

Ejemplares similares