Cargando…

CHEMDNER: The drugs and chemical names extraction challenge

Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the C...

Descripción completa

Detalles Bibliográficos
Autores principales:	Krallinger, Martin, Leitner, Florian, Rabal, Obdulia, Vazquez, Miguel, Oyarzabal, Julen, Valencia, Alfonso
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331685/ https://www.ncbi.nlm.nih.gov/pubmed/25810766 http://dx.doi.org/10.1186/1758-2946-7-S1-S1

_version_	1782357758337613824
author	Krallinger, Martin Leitner, Florian Rabal, Obdulia Vazquez, Miguel Oyarzabal, Julen Valencia, Alfonso
author_facet	Krallinger, Martin Leitner, Florian Rabal, Obdulia Vazquez, Miguel Oyarzabal, Julen Valencia, Alfonso
author_sort	Krallinger, Martin
collection	PubMed
description	Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data. We evaluated two important aspects: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). 27 teams (23 academic and 4 commercial, a total of 87 researchers) returned results for the CHEMDNER tasks: 26 teams for CEM and 23 for the CDI task. Top scoring teams obtained an F-score of 87.39% for the CEM task and 88.20% for the CDI task, a very promising result when compared to the agreement between human annotators (91%). The strategies used to detect chemicals included machine learning methods (e.g. conditional random fields) using a variety of features, chemistry and drug lexica, and domain-specific rules. We expect that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications and will form the basis to find related chemical information for the detected entities, such as toxicological or pharmacogenomic properties.
format	Online Article Text
id	pubmed-4331685
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43316852015-03-25 CHEMDNER: The drugs and chemical names extraction challenge Krallinger, Martin Leitner, Florian Rabal, Obdulia Vazquez, Miguel Oyarzabal, Julen Valencia, Alfonso J Cheminform Research Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data. We evaluated two important aspects: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). 27 teams (23 academic and 4 commercial, a total of 87 researchers) returned results for the CHEMDNER tasks: 26 teams for CEM and 23 for the CDI task. Top scoring teams obtained an F-score of 87.39% for the CEM task and 88.20% for the CDI task, a very promising result when compared to the agreement between human annotators (91%). The strategies used to detect chemicals included machine learning methods (e.g. conditional random fields) using a variety of features, chemistry and drug lexica, and domain-specific rules. We expect that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications and will form the basis to find related chemical information for the detected entities, such as toxicological or pharmacogenomic properties. BioMed Central 2015-01-19 /pmc/articles/PMC4331685/ /pubmed/25810766 http://dx.doi.org/10.1186/1758-2946-7-S1-S1 Text en Copyright © 2015 Krallinger et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Krallinger, Martin Leitner, Florian Rabal, Obdulia Vazquez, Miguel Oyarzabal, Julen Valencia, Alfonso CHEMDNER: The drugs and chemical names extraction challenge
title	CHEMDNER: The drugs and chemical names extraction challenge
title_full	CHEMDNER: The drugs and chemical names extraction challenge
title_fullStr	CHEMDNER: The drugs and chemical names extraction challenge
title_full_unstemmed	CHEMDNER: The drugs and chemical names extraction challenge
title_short	CHEMDNER: The drugs and chemical names extraction challenge
title_sort	chemdner: the drugs and chemical names extraction challenge
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331685/ https://www.ncbi.nlm.nih.gov/pubmed/25810766 http://dx.doi.org/10.1186/1758-2946-7-S1-S1
work_keys_str_mv	AT krallingermartin chemdnerthedrugsandchemicalnamesextractionchallenge AT leitnerflorian chemdnerthedrugsandchemicalnamesextractionchallenge AT rabalobdulia chemdnerthedrugsandchemicalnamesextractionchallenge AT vazquezmiguel chemdnerthedrugsandchemicalnamesextractionchallenge AT oyarzabaljulen chemdnerthedrugsandchemicalnamesextractionchallenge AT valenciaalfonso chemdnerthedrugsandchemicalnamesextractionchallenge

CHEMDNER: The drugs and chemical names extraction challenge

Ejemplares similares