Cargando…

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included...

Descripción completa

Detalles Bibliográficos
Autores principales: Wiegers, Thomas C., Davis, Allan Peter, Mattingly, Carolyn J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207221/
https://www.ncbi.nlm.nih.gov/pubmed/24919658
http://dx.doi.org/10.1093/database/bau050
_version_ 1782340937381314560
author Wiegers, Thomas C.
Davis, Allan Peter
Mattingly, Carolyn J.
author_facet Wiegers, Thomas C.
Davis, Allan Peter
Mattingly, Carolyn J.
author_sort Wiegers, Thomas C.
collection PubMed
description The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/
format Online
Article
Text
id pubmed-4207221
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-42072212014-10-28 Web services-based text-mining demonstrates broad impacts for interoperability and process simplification Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Database (Oxford) Original Article The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ Oxford University Press 2014-06-10 /pmc/articles/PMC4207221/ /pubmed/24919658 http://dx.doi.org/10.1093/database/bau050 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Wiegers, Thomas C.
Davis, Allan Peter
Mattingly, Carolyn J.
Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_full Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_fullStr Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_full_unstemmed Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_short Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
title_sort web services-based text-mining demonstrates broad impacts for interoperability and process simplification
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207221/
https://www.ncbi.nlm.nih.gov/pubmed/24919658
http://dx.doi.org/10.1093/database/bau050
work_keys_str_mv AT wiegersthomasc webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification
AT davisallanpeter webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification
AT mattinglycarolynj webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification