Cargando…
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4860626/ https://www.ncbi.nlm.nih.gov/pubmed/27161011 http://dx.doi.org/10.1093/database/baw068 |
_version_ | 1782431097984909312 |
---|---|
author | Li, Jiao Sun, Yueping Johnson, Robin J. Sciaky, Daniela Wei, Chih-Hsuan Leaman, Robert Davis, Allan Peter Mattingly, Carolyn J. Wiegers, Thomas C. Lu, Zhiyong |
author_facet | Li, Jiao Sun, Yueping Johnson, Robin J. Sciaky, Daniela Wei, Chih-Hsuan Leaman, Robert Davis, Allan Peter Mattingly, Carolyn J. Wiegers, Thomas C. Lu, Zhiyong |
author_sort | Li, Jiao |
collection | PubMed |
description | Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community. Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/ |
format | Online Article Text |
id | pubmed-4860626 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-48606262016-05-10 BioCreative V CDR task corpus: a resource for chemical disease relation extraction Li, Jiao Sun, Yueping Johnson, Robin J. Sciaky, Daniela Wei, Chih-Hsuan Leaman, Robert Davis, Allan Peter Mattingly, Carolyn J. Wiegers, Thomas C. Lu, Zhiyong Database (Oxford) Original Article Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community. Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/ Oxford University Press 2016-05-08 /pmc/articles/PMC4860626/ /pubmed/27161011 http://dx.doi.org/10.1093/database/baw068 Text en Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States. |
spellingShingle | Original Article Li, Jiao Sun, Yueping Johnson, Robin J. Sciaky, Daniela Wei, Chih-Hsuan Leaman, Robert Davis, Allan Peter Mattingly, Carolyn J. Wiegers, Thomas C. Lu, Zhiyong BioCreative V CDR task corpus: a resource for chemical disease relation extraction |
title | BioCreative V CDR task corpus: a resource for chemical disease relation extraction |
title_full | BioCreative V CDR task corpus: a resource for chemical disease relation extraction |
title_fullStr | BioCreative V CDR task corpus: a resource for chemical disease relation extraction |
title_full_unstemmed | BioCreative V CDR task corpus: a resource for chemical disease relation extraction |
title_short | BioCreative V CDR task corpus: a resource for chemical disease relation extraction |
title_sort | biocreative v cdr task corpus: a resource for chemical disease relation extraction |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4860626/ https://www.ncbi.nlm.nih.gov/pubmed/27161011 http://dx.doi.org/10.1093/database/baw068 |
work_keys_str_mv | AT lijiao biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT sunyueping biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT johnsonrobinj biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT sciakydaniela biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT weichihhsuan biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT leamanrobert biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT davisallanpeter biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT mattinglycarolynj biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT wiegersthomasc biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction AT luzhiyong biocreativevcdrtaskcorpusaresourceforchemicaldiseaserelationextraction |