Cargando…
The CHEMDNER corpus of chemicals and drugs and its annotation principles
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large cor...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331692/ https://www.ncbi.nlm.nih.gov/pubmed/25810773 http://dx.doi.org/10.1186/1758-2946-7-S1-S2 |
_version_ | 1782357759947177984 |
---|---|
author | Krallinger, Martin Rabal, Obdulia Leitner, Florian Vazquez, Miguel Salgado, David Lu, Zhiyong Leaman, Robert Lu, Yanan Ji, Donghong Lowe, Daniel M Sayle, Roger A Batista-Navarro, Riza Theresa Rak, Rafal Huber, Torsten Rocktäschel, Tim Matos, Sérgio Campos, David Tang, Buzhou Xu, Hua Munkhdalai, Tsendsuren Ryu, Keun Ho Ramanan, SV Nathan, Senthil Žitnik, Slavko Bajec, Marko Weber, Lutz Irmer, Matthias Akhondi, Saber A Kors, Jan A Xu, Shuo An, Xin Sikdar, Utpal Kumar Ekbal, Asif Yoshioka, Masaharu Dieb, Thaer M Choi, Miji Verspoor, Karin Khabsa, Madian Giles, C Lee Liu, Hongfang Ravikumar, Komandur Elayavilli Lamurias, Andre Couto, Francisco M Dai, Hong-Jie Tsai, Richard Tzong-Han Ata, Caglar Can, Tolga Usié, Anabel Alves, Rui Segura-Bedmar, Isabel Martínez, Paloma Oyarzabal, Julen Valencia, Alfonso |
author_facet | Krallinger, Martin Rabal, Obdulia Leitner, Florian Vazquez, Miguel Salgado, David Lu, Zhiyong Leaman, Robert Lu, Yanan Ji, Donghong Lowe, Daniel M Sayle, Roger A Batista-Navarro, Riza Theresa Rak, Rafal Huber, Torsten Rocktäschel, Tim Matos, Sérgio Campos, David Tang, Buzhou Xu, Hua Munkhdalai, Tsendsuren Ryu, Keun Ho Ramanan, SV Nathan, Senthil Žitnik, Slavko Bajec, Marko Weber, Lutz Irmer, Matthias Akhondi, Saber A Kors, Jan A Xu, Shuo An, Xin Sikdar, Utpal Kumar Ekbal, Asif Yoshioka, Masaharu Dieb, Thaer M Choi, Miji Verspoor, Karin Khabsa, Madian Giles, C Lee Liu, Hongfang Ravikumar, Komandur Elayavilli Lamurias, Andre Couto, Francisco M Dai, Hong-Jie Tsai, Richard Tzong-Han Ata, Caglar Can, Tolga Usié, Anabel Alves, Rui Segura-Bedmar, Isabel Martínez, Paloma Oyarzabal, Julen Valencia, Alfonso |
author_sort | Krallinger, Martin |
collection | PubMed |
description | The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ |
format | Online Article Text |
id | pubmed-4331692 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43316922015-03-25 The CHEMDNER corpus of chemicals and drugs and its annotation principles Krallinger, Martin Rabal, Obdulia Leitner, Florian Vazquez, Miguel Salgado, David Lu, Zhiyong Leaman, Robert Lu, Yanan Ji, Donghong Lowe, Daniel M Sayle, Roger A Batista-Navarro, Riza Theresa Rak, Rafal Huber, Torsten Rocktäschel, Tim Matos, Sérgio Campos, David Tang, Buzhou Xu, Hua Munkhdalai, Tsendsuren Ryu, Keun Ho Ramanan, SV Nathan, Senthil Žitnik, Slavko Bajec, Marko Weber, Lutz Irmer, Matthias Akhondi, Saber A Kors, Jan A Xu, Shuo An, Xin Sikdar, Utpal Kumar Ekbal, Asif Yoshioka, Masaharu Dieb, Thaer M Choi, Miji Verspoor, Karin Khabsa, Madian Giles, C Lee Liu, Hongfang Ravikumar, Komandur Elayavilli Lamurias, Andre Couto, Francisco M Dai, Hong-Jie Tsai, Richard Tzong-Han Ata, Caglar Can, Tolga Usié, Anabel Alves, Rui Segura-Bedmar, Isabel Martínez, Paloma Oyarzabal, Julen Valencia, Alfonso J Cheminform Research The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ BioMed Central 2015-01-19 /pmc/articles/PMC4331692/ /pubmed/25810773 http://dx.doi.org/10.1186/1758-2946-7-S1-S2 Text en Copyright © 2015 Krallinger et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Krallinger, Martin Rabal, Obdulia Leitner, Florian Vazquez, Miguel Salgado, David Lu, Zhiyong Leaman, Robert Lu, Yanan Ji, Donghong Lowe, Daniel M Sayle, Roger A Batista-Navarro, Riza Theresa Rak, Rafal Huber, Torsten Rocktäschel, Tim Matos, Sérgio Campos, David Tang, Buzhou Xu, Hua Munkhdalai, Tsendsuren Ryu, Keun Ho Ramanan, SV Nathan, Senthil Žitnik, Slavko Bajec, Marko Weber, Lutz Irmer, Matthias Akhondi, Saber A Kors, Jan A Xu, Shuo An, Xin Sikdar, Utpal Kumar Ekbal, Asif Yoshioka, Masaharu Dieb, Thaer M Choi, Miji Verspoor, Karin Khabsa, Madian Giles, C Lee Liu, Hongfang Ravikumar, Komandur Elayavilli Lamurias, Andre Couto, Francisco M Dai, Hong-Jie Tsai, Richard Tzong-Han Ata, Caglar Can, Tolga Usié, Anabel Alves, Rui Segura-Bedmar, Isabel Martínez, Paloma Oyarzabal, Julen Valencia, Alfonso The CHEMDNER corpus of chemicals and drugs and its annotation principles |
title | The CHEMDNER corpus of chemicals and drugs and its annotation principles |
title_full | The CHEMDNER corpus of chemicals and drugs and its annotation principles |
title_fullStr | The CHEMDNER corpus of chemicals and drugs and its annotation principles |
title_full_unstemmed | The CHEMDNER corpus of chemicals and drugs and its annotation principles |
title_short | The CHEMDNER corpus of chemicals and drugs and its annotation principles |
title_sort | chemdner corpus of chemicals and drugs and its annotation principles |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331692/ https://www.ncbi.nlm.nih.gov/pubmed/25810773 http://dx.doi.org/10.1186/1758-2946-7-S1-S2 |
work_keys_str_mv | AT krallingermartin thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT rabalobdulia thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT leitnerflorian thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT vazquezmiguel thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT salgadodavid thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT luzhiyong thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT leamanrobert thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT luyanan thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT jidonghong thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT lowedanielm thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT saylerogera thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT batistanavarrorizatheresa thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT rakrafal thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT hubertorsten thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT rocktascheltim thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT matossergio thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT camposdavid thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT tangbuzhou thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT xuhua thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT munkhdalaitsendsuren thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ryukeunho thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ramanansv thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT nathansenthil thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT zitnikslavko thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT bajecmarko thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT weberlutz thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT irmermatthias thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT akhondisabera thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT korsjana thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT xushuo thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT anxin thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT sikdarutpalkumar thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ekbalasif thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT yoshiokamasaharu thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT diebthaerm thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT choimiji thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT verspoorkarin thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT khabsamadian thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT gilesclee thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT liuhongfang thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ravikumarkomandurelayavilli thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT lamuriasandre thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT coutofranciscom thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT daihongjie thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT tsairichardtzonghan thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT atacaglar thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT cantolga thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT usieanabel thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT alvesrui thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT segurabedmarisabel thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT martinezpaloma thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT oyarzabaljulen thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT valenciaalfonso thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT krallingermartin chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT rabalobdulia chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT leitnerflorian chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT vazquezmiguel chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT salgadodavid chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT luzhiyong chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT leamanrobert chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT luyanan chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT jidonghong chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT lowedanielm chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT saylerogera chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT batistanavarrorizatheresa chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT rakrafal chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT hubertorsten chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT rocktascheltim chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT matossergio chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT camposdavid chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT tangbuzhou chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT xuhua chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT munkhdalaitsendsuren chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ryukeunho chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ramanansv chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT nathansenthil chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT zitnikslavko chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT bajecmarko chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT weberlutz chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT irmermatthias chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT akhondisabera chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT korsjana chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT xushuo chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT anxin chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT sikdarutpalkumar chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ekbalasif chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT yoshiokamasaharu chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT diebthaerm chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT choimiji chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT verspoorkarin chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT khabsamadian chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT gilesclee chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT liuhongfang chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT ravikumarkomandurelayavilli chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT lamuriasandre chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT coutofranciscom chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT daihongjie chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT tsairichardtzonghan chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT atacaglar chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT cantolga chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT usieanabel chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT alvesrui chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT segurabedmarisabel chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT martinezpaloma chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT oyarzabaljulen chemdnercorpusofchemicalsanddrugsanditsannotationprinciples AT valenciaalfonso chemdnercorpusofchemicalsanddrugsanditsannotationprinciples |