Cargando…

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

BACKGROUND: The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Batista-Navarro, Riza, Rak, Rafal, Ananiadou, Sophia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331696/ https://www.ncbi.nlm.nih.gov/pubmed/25810777 http://dx.doi.org/10.1186/1758-2946-7-S1-S6

_version_	1782357760865730560
author	Batista-Navarro, Riza Rak, Rafal Ananiadou, Sophia
author_facet	Batista-Navarro, Riza Rak, Rafal Ananiadou, Sophia
author_sort	Batista-Navarro, Riza
collection	PubMed
description	BACKGROUND: The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV) workshop greatly alleviated this problem and allowed us to develop a conditional random fields-based chemical entity recogniser. In order to optimise its performance, we introduced customisations in various aspects of our solution. These include the selection of specialised pre-processing analytics, the incorporation of chemistry knowledge-rich features in the training and application of the statistical model, and the addition of post-processing rules. RESULTS: Our evaluation shows that optimal performance is obtained when our customisations are integrated into the chemical entity recogniser. When its performance is compared with that of state-of-the-art methods, under comparable experimental settings, our solution achieves competitive advantage. We also show that our recogniser that uses a model trained on the CHEMDNER corpus is suitable for recognising names in a wide range of corpora, consistently outperforming two popular chemical NER tools. CONCLUSION: The contributions resulting from this work are two-fold. Firstly, we present the details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods. Secondly, the developed suite of solutions has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo. This allows interested users to conveniently apply and evaluate our solutions in the context of other chemical text mining tasks.
format	Online Article Text
id	pubmed-4331696
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43316962015-03-25 Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics Batista-Navarro, Riza Rak, Rafal Ananiadou, Sophia J Cheminform Research BACKGROUND: The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV) workshop greatly alleviated this problem and allowed us to develop a conditional random fields-based chemical entity recogniser. In order to optimise its performance, we introduced customisations in various aspects of our solution. These include the selection of specialised pre-processing analytics, the incorporation of chemistry knowledge-rich features in the training and application of the statistical model, and the addition of post-processing rules. RESULTS: Our evaluation shows that optimal performance is obtained when our customisations are integrated into the chemical entity recogniser. When its performance is compared with that of state-of-the-art methods, under comparable experimental settings, our solution achieves competitive advantage. We also show that our recogniser that uses a model trained on the CHEMDNER corpus is suitable for recognising names in a wide range of corpora, consistently outperforming two popular chemical NER tools. CONCLUSION: The contributions resulting from this work are two-fold. Firstly, we present the details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods. Secondly, the developed suite of solutions has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo. This allows interested users to conveniently apply and evaluate our solutions in the context of other chemical text mining tasks. BioMed Central 2015-01-19 /pmc/articles/PMC4331696/ /pubmed/25810777 http://dx.doi.org/10.1186/1758-2946-7-S1-S6 Text en Copyright © 2015 Batista-Navarro et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Batista-Navarro, Riza Rak, Rafal Ananiadou, Sophia Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
title	Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
title_full	Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
title_fullStr	Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
title_full_unstemmed	Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
title_short	Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
title_sort	optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331696/ https://www.ncbi.nlm.nih.gov/pubmed/25810777 http://dx.doi.org/10.1186/1758-2946-7-S1-S6
work_keys_str_mv	AT batistanavarroriza optimisingchemicalnamedentityrecognitionwithpreprocessinganalyticsknowledgerichfeaturesandheuristics AT rakrafal optimisingchemicalnamedentityrecognitionwithpreprocessinganalyticsknowledgerichfeaturesandheuristics AT ananiadousophia optimisingchemicalnamedentityrecognitionwithpreprocessinganalyticsknowledgerichfeaturesandheuristics

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

Ejemplares similares