Cargando…
Collaborative biocuration—text-mining development task for document prioritization for curation
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The ‘BioCreative Workshop 2012’ subcommittee identified three areas, or tracks,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504477/ https://www.ncbi.nlm.nih.gov/pubmed/23180769 http://dx.doi.org/10.1093/database/bas037 |
_version_ | 1782250637759610880 |
---|---|
author | Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. |
author_facet | Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. |
author_sort | Wiegers, Thomas C. |
collection | PubMed |
description | The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The ‘BioCreative Workshop 2012’ subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical–gene–disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical ‘named-entity recognition’ (NER) across articles; the effectiveness of ‘information retrieval’ (IR) was also measured based on ‘mean average precision’ (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD’s biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results. |
format | Online Article Text |
id | pubmed-3504477 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35044772012-11-23 Collaborative biocuration—text-mining development task for document prioritization for curation Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Database (Oxford) BioCreative Virtual Issue The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The ‘BioCreative Workshop 2012’ subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical–gene–disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical ‘named-entity recognition’ (NER) across articles; the effectiveness of ‘information retrieval’ (IR) was also measured based on ‘mean average precision’ (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD’s biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results. Oxford University Press 2012-11-22 /pmc/articles/PMC3504477/ /pubmed/23180769 http://dx.doi.org/10.1093/database/bas037 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com. |
spellingShingle | BioCreative Virtual Issue Wiegers, Thomas C. Davis, Allan Peter Mattingly, Carolyn J. Collaborative biocuration—text-mining development task for document prioritization for curation |
title | Collaborative biocuration—text-mining development task for document prioritization for curation |
title_full | Collaborative biocuration—text-mining development task for document prioritization for curation |
title_fullStr | Collaborative biocuration—text-mining development task for document prioritization for curation |
title_full_unstemmed | Collaborative biocuration—text-mining development task for document prioritization for curation |
title_short | Collaborative biocuration—text-mining development task for document prioritization for curation |
title_sort | collaborative biocuration—text-mining development task for document prioritization for curation |
topic | BioCreative Virtual Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504477/ https://www.ncbi.nlm.nih.gov/pubmed/23180769 http://dx.doi.org/10.1093/database/bas037 |
work_keys_str_mv | AT wiegersthomasc collaborativebiocurationtextminingdevelopmenttaskfordocumentprioritizationforcuration AT davisallanpeter collaborativebiocurationtextminingdevelopmenttaskfordocumentprioritizationforcuration AT mattinglycarolynj collaborativebiocurationtextminingdevelopmenttaskfordocumentprioritizationforcuration |