Cargando…

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm

BACKGROUND: Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Pérez-Pérez, Martin, Pérez-Rodríguez, Gael, Blanco-Míguez, Aitor, Fdez-Riverola, Florentino, Valencia, Alfonso, Krallinger, Martin, Lourenço, Anália
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6591930/
https://www.ncbi.nlm.nih.gov/pubmed/31236786
http://dx.doi.org/10.1186/s13321-019-0363-6
_version_ 1783429808888217600
author Pérez-Pérez, Martin
Pérez-Rodríguez, Gael
Blanco-Míguez, Aitor
Fdez-Riverola, Florentino
Valencia, Alfonso
Krallinger, Martin
Lourenço, Anália
author_facet Pérez-Pérez, Martin
Pérez-Rodríguez, Gael
Blanco-Míguez, Aitor
Fdez-Riverola, Florentino
Valencia, Alfonso
Krallinger, Martin
Lourenço, Anália
author_sort Pérez-Pérez, Martin
collection PubMed
description BACKGROUND: Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called “Technical interoperability and performance of annotation servers” was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications. RESULTS: A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period. CONCLUSIONS: The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0363-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6591930
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-65919302019-07-10 Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm Pérez-Pérez, Martin Pérez-Rodríguez, Gael Blanco-Míguez, Aitor Fdez-Riverola, Florentino Valencia, Alfonso Krallinger, Martin Lourenço, Anália J Cheminform Research Article BACKGROUND: Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called “Technical interoperability and performance of annotation servers” was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications. RESULTS: A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period. CONCLUSIONS: The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0363-6) contains supplementary material, which is available to authorized users. Springer International Publishing 2019-06-24 /pmc/articles/PMC6591930/ /pubmed/31236786 http://dx.doi.org/10.1186/s13321-019-0363-6 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Pérez-Pérez, Martin
Pérez-Rodríguez, Gael
Blanco-Míguez, Aitor
Fdez-Riverola, Florentino
Valencia, Alfonso
Krallinger, Martin
Lourenço, Anália
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_full Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_fullStr Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_full_unstemmed Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_short Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_sort next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of becalm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6591930/
https://www.ncbi.nlm.nih.gov/pubmed/31236786
http://dx.doi.org/10.1186/s13321-019-0363-6
work_keys_str_mv AT perezperezmartin nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm
AT perezrodriguezgael nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm
AT blancomiguezaitor nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm
AT fdezriverolaflorentino nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm
AT valenciaalfonso nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm
AT krallingermartin nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm
AT lourencoanalia nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm