Cargando…

Collaborative biocuration—text-mining development task for document prioritization for curation

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The ‘BioCreative Workshop 2012’ subcommittee identified three areas, or tracks,...

Descripción completa

Detalles Bibliográficos
Autores principales: Wiegers, Thomas C., Davis, Allan Peter, Mattingly, Carolyn J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504477/
https://www.ncbi.nlm.nih.gov/pubmed/23180769
http://dx.doi.org/10.1093/database/bas037
_version_ 1782250637759610880
author Wiegers, Thomas C.
Davis, Allan Peter
Mattingly, Carolyn J.
author_facet Wiegers, Thomas C.
Davis, Allan Peter
Mattingly, Carolyn J.
author_sort Wiegers, Thomas C.
collection PubMed
description The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The ‘BioCreative Workshop 2012’ subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical–gene–disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical ‘named-entity recognition’ (NER) across articles; the effectiveness of ‘information retrieval’ (IR) was also measured based on ‘mean average precision’ (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD’s biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.
format Online
Article
Text
id pubmed-3504477
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35044772012-11-23 Collaborative biocuration—text-mining development task for document prioritization for curation Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Database (Oxford) BioCreative Virtual Issue The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The ‘BioCreative Workshop 2012’ subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical–gene–disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical ‘named-entity recognition’ (NER) across articles; the effectiveness of ‘information retrieval’ (IR) was also measured based on ‘mean average precision’ (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD’s biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results. Oxford University Press 2012-11-22 /pmc/articles/PMC3504477/ /pubmed/23180769 http://dx.doi.org/10.1093/database/bas037 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle BioCreative Virtual Issue
Wiegers, Thomas C.
Davis, Allan Peter
Mattingly, Carolyn J.
Collaborative biocuration—text-mining development task for document prioritization for curation
title Collaborative biocuration—text-mining development task for document prioritization for curation
title_full Collaborative biocuration—text-mining development task for document prioritization for curation
title_fullStr Collaborative biocuration—text-mining development task for document prioritization for curation
title_full_unstemmed Collaborative biocuration—text-mining development task for document prioritization for curation
title_short Collaborative biocuration—text-mining development task for document prioritization for curation
title_sort collaborative biocuration—text-mining development task for document prioritization for curation
topic BioCreative Virtual Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504477/
https://www.ncbi.nlm.nih.gov/pubmed/23180769
http://dx.doi.org/10.1093/database/bas037
work_keys_str_mv AT wiegersthomasc collaborativebiocurationtextminingdevelopmenttaskfordocumentprioritizationforcuration
AT davisallanpeter collaborativebiocurationtextminingdevelopmenttaskfordocumentprioritizationforcuration
AT mattinglycarolynj collaborativebiocurationtextminingdevelopmenttaskfordocumentprioritizationforcuration