Cargando…

The CHEMDNER corpus of chemicals and drugs and its annotation principles

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large cor...

Descripción completa

Detalles Bibliográficos
Autores principales: Krallinger, Martin, Rabal, Obdulia, Leitner, Florian, Vazquez, Miguel, Salgado, David, Lu, Zhiyong, Leaman, Robert, Lu, Yanan, Ji, Donghong, Lowe, Daniel M, Sayle, Roger A, Batista-Navarro, Riza Theresa, Rak, Rafal, Huber, Torsten, Rocktäschel, Tim, Matos, Sérgio, Campos, David, Tang, Buzhou, Xu, Hua, Munkhdalai, Tsendsuren, Ryu, Keun Ho, Ramanan, SV, Nathan, Senthil, Žitnik, Slavko, Bajec, Marko, Weber, Lutz, Irmer, Matthias, Akhondi, Saber A, Kors, Jan A, Xu, Shuo, An, Xin, Sikdar, Utpal Kumar, Ekbal, Asif, Yoshioka, Masaharu, Dieb, Thaer M, Choi, Miji, Verspoor, Karin, Khabsa, Madian, Giles, C Lee, Liu, Hongfang, Ravikumar, Komandur Elayavilli, Lamurias, Andre, Couto, Francisco M, Dai, Hong-Jie, Tsai, Richard Tzong-Han, Ata, Caglar, Can, Tolga, Usié, Anabel, Alves, Rui, Segura-Bedmar, Isabel, Martínez, Paloma, Oyarzabal, Julen, Valencia, Alfonso
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331692/
https://www.ncbi.nlm.nih.gov/pubmed/25810773
http://dx.doi.org/10.1186/1758-2946-7-S1-S2
_version_ 1782357759947177984
author Krallinger, Martin
Rabal, Obdulia
Leitner, Florian
Vazquez, Miguel
Salgado, David
Lu, Zhiyong
Leaman, Robert
Lu, Yanan
Ji, Donghong
Lowe, Daniel M
Sayle, Roger A
Batista-Navarro, Riza Theresa
Rak, Rafal
Huber, Torsten
Rocktäschel, Tim
Matos, Sérgio
Campos, David
Tang, Buzhou
Xu, Hua
Munkhdalai, Tsendsuren
Ryu, Keun Ho
Ramanan, SV
Nathan, Senthil
Žitnik, Slavko
Bajec, Marko
Weber, Lutz
Irmer, Matthias
Akhondi, Saber A
Kors, Jan A
Xu, Shuo
An, Xin
Sikdar, Utpal Kumar
Ekbal, Asif
Yoshioka, Masaharu
Dieb, Thaer M
Choi, Miji
Verspoor, Karin
Khabsa, Madian
Giles, C Lee
Liu, Hongfang
Ravikumar, Komandur Elayavilli
Lamurias, Andre
Couto, Francisco M
Dai, Hong-Jie
Tsai, Richard Tzong-Han
Ata, Caglar
Can, Tolga
Usié, Anabel
Alves, Rui
Segura-Bedmar, Isabel
Martínez, Paloma
Oyarzabal, Julen
Valencia, Alfonso
author_facet Krallinger, Martin
Rabal, Obdulia
Leitner, Florian
Vazquez, Miguel
Salgado, David
Lu, Zhiyong
Leaman, Robert
Lu, Yanan
Ji, Donghong
Lowe, Daniel M
Sayle, Roger A
Batista-Navarro, Riza Theresa
Rak, Rafal
Huber, Torsten
Rocktäschel, Tim
Matos, Sérgio
Campos, David
Tang, Buzhou
Xu, Hua
Munkhdalai, Tsendsuren
Ryu, Keun Ho
Ramanan, SV
Nathan, Senthil
Žitnik, Slavko
Bajec, Marko
Weber, Lutz
Irmer, Matthias
Akhondi, Saber A
Kors, Jan A
Xu, Shuo
An, Xin
Sikdar, Utpal Kumar
Ekbal, Asif
Yoshioka, Masaharu
Dieb, Thaer M
Choi, Miji
Verspoor, Karin
Khabsa, Madian
Giles, C Lee
Liu, Hongfang
Ravikumar, Komandur Elayavilli
Lamurias, Andre
Couto, Francisco M
Dai, Hong-Jie
Tsai, Richard Tzong-Han
Ata, Caglar
Can, Tolga
Usié, Anabel
Alves, Rui
Segura-Bedmar, Isabel
Martínez, Paloma
Oyarzabal, Julen
Valencia, Alfonso
author_sort Krallinger, Martin
collection PubMed
description The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/
format Online
Article
Text
id pubmed-4331692
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43316922015-03-25 The CHEMDNER corpus of chemicals and drugs and its annotation principles Krallinger, Martin Rabal, Obdulia Leitner, Florian Vazquez, Miguel Salgado, David Lu, Zhiyong Leaman, Robert Lu, Yanan Ji, Donghong Lowe, Daniel M Sayle, Roger A Batista-Navarro, Riza Theresa Rak, Rafal Huber, Torsten Rocktäschel, Tim Matos, Sérgio Campos, David Tang, Buzhou Xu, Hua Munkhdalai, Tsendsuren Ryu, Keun Ho Ramanan, SV Nathan, Senthil Žitnik, Slavko Bajec, Marko Weber, Lutz Irmer, Matthias Akhondi, Saber A Kors, Jan A Xu, Shuo An, Xin Sikdar, Utpal Kumar Ekbal, Asif Yoshioka, Masaharu Dieb, Thaer M Choi, Miji Verspoor, Karin Khabsa, Madian Giles, C Lee Liu, Hongfang Ravikumar, Komandur Elayavilli Lamurias, Andre Couto, Francisco M Dai, Hong-Jie Tsai, Richard Tzong-Han Ata, Caglar Can, Tolga Usié, Anabel Alves, Rui Segura-Bedmar, Isabel Martínez, Paloma Oyarzabal, Julen Valencia, Alfonso J Cheminform Research The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ BioMed Central 2015-01-19 /pmc/articles/PMC4331692/ /pubmed/25810773 http://dx.doi.org/10.1186/1758-2946-7-S1-S2 Text en Copyright © 2015 Krallinger et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Krallinger, Martin
Rabal, Obdulia
Leitner, Florian
Vazquez, Miguel
Salgado, David
Lu, Zhiyong
Leaman, Robert
Lu, Yanan
Ji, Donghong
Lowe, Daniel M
Sayle, Roger A
Batista-Navarro, Riza Theresa
Rak, Rafal
Huber, Torsten
Rocktäschel, Tim
Matos, Sérgio
Campos, David
Tang, Buzhou
Xu, Hua
Munkhdalai, Tsendsuren
Ryu, Keun Ho
Ramanan, SV
Nathan, Senthil
Žitnik, Slavko
Bajec, Marko
Weber, Lutz
Irmer, Matthias
Akhondi, Saber A
Kors, Jan A
Xu, Shuo
An, Xin
Sikdar, Utpal Kumar
Ekbal, Asif
Yoshioka, Masaharu
Dieb, Thaer M
Choi, Miji
Verspoor, Karin
Khabsa, Madian
Giles, C Lee
Liu, Hongfang
Ravikumar, Komandur Elayavilli
Lamurias, Andre
Couto, Francisco M
Dai, Hong-Jie
Tsai, Richard Tzong-Han
Ata, Caglar
Can, Tolga
Usié, Anabel
Alves, Rui
Segura-Bedmar, Isabel
Martínez, Paloma
Oyarzabal, Julen
Valencia, Alfonso
The CHEMDNER corpus of chemicals and drugs and its annotation principles
title The CHEMDNER corpus of chemicals and drugs and its annotation principles
title_full The CHEMDNER corpus of chemicals and drugs and its annotation principles
title_fullStr The CHEMDNER corpus of chemicals and drugs and its annotation principles
title_full_unstemmed The CHEMDNER corpus of chemicals and drugs and its annotation principles
title_short The CHEMDNER corpus of chemicals and drugs and its annotation principles
title_sort chemdner corpus of chemicals and drugs and its annotation principles
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331692/
https://www.ncbi.nlm.nih.gov/pubmed/25810773
http://dx.doi.org/10.1186/1758-2946-7-S1-S2
work_keys_str_mv AT krallingermartin thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT rabalobdulia thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT leitnerflorian thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT vazquezmiguel thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT salgadodavid thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT luzhiyong thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT leamanrobert thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT luyanan thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT jidonghong thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT lowedanielm thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT saylerogera thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT batistanavarrorizatheresa thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT rakrafal thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT hubertorsten thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT rocktascheltim thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT matossergio thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT camposdavid thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT tangbuzhou thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT xuhua thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT munkhdalaitsendsuren thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ryukeunho thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ramanansv thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT nathansenthil thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT zitnikslavko thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT bajecmarko thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT weberlutz thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT irmermatthias thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT akhondisabera thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT korsjana thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT xushuo thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT anxin thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT sikdarutpalkumar thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ekbalasif thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT yoshiokamasaharu thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT diebthaerm thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT choimiji thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT verspoorkarin thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT khabsamadian thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT gilesclee thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT liuhongfang thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ravikumarkomandurelayavilli thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT lamuriasandre thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT coutofranciscom thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT daihongjie thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT tsairichardtzonghan thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT atacaglar thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT cantolga thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT usieanabel thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT alvesrui thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT segurabedmarisabel thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT martinezpaloma thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT oyarzabaljulen thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT valenciaalfonso thechemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT krallingermartin chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT rabalobdulia chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT leitnerflorian chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT vazquezmiguel chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT salgadodavid chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT luzhiyong chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT leamanrobert chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT luyanan chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT jidonghong chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT lowedanielm chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT saylerogera chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT batistanavarrorizatheresa chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT rakrafal chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT hubertorsten chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT rocktascheltim chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT matossergio chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT camposdavid chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT tangbuzhou chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT xuhua chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT munkhdalaitsendsuren chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ryukeunho chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ramanansv chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT nathansenthil chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT zitnikslavko chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT bajecmarko chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT weberlutz chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT irmermatthias chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT akhondisabera chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT korsjana chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT xushuo chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT anxin chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT sikdarutpalkumar chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ekbalasif chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT yoshiokamasaharu chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT diebthaerm chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT choimiji chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT verspoorkarin chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT khabsamadian chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT gilesclee chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT liuhongfang chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT ravikumarkomandurelayavilli chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT lamuriasandre chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT coutofranciscom chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT daihongjie chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT tsairichardtzonghan chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT atacaglar chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT cantolga chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT usieanabel chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT alvesrui chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT segurabedmarisabel chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT martinezpaloma chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT oyarzabaljulen chemdnercorpusofchemicalsanddrugsanditsannotationprinciples
AT valenciaalfonso chemdnercorpusofchemicalsanddrugsanditsannotationprinciples