Cargando…
Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207221/ https://www.ncbi.nlm.nih.gov/pubmed/24919658 http://dx.doi.org/10.1093/database/bau050 |
_version_ | 1782340937381314560 |
---|---|
author | Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. |
author_facet | Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. |
author_sort | Wiegers, Thomas C. |
collection | PubMed |
description | The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ |
format | Online Article Text |
id | pubmed-4207221 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-42072212014-10-28 Web services-based text-mining demonstrates broad impacts for interoperability and process simplification Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Database (Oxford) Original Article The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ Oxford University Press 2014-06-10 /pmc/articles/PMC4207221/ /pubmed/24919658 http://dx.doi.org/10.1093/database/bau050 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
title | Web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
title_full | Web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
title_fullStr | Web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
title_full_unstemmed | Web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
title_short | Web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
title_sort | web services-based text-mining demonstrates broad impacts for interoperability and process simplification |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207221/ https://www.ncbi.nlm.nih.gov/pubmed/24919658 http://dx.doi.org/10.1093/database/bau050 |
work_keys_str_mv | AT wiegersthomasc webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification AT davisallanpeter webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification AT mattinglycarolynj webservicesbasedtextminingdemonstratesbroadimpactsforinteroperabilityandprocesssimplification |