Cargando…

Chemlistem: chemical named entity recognition using recurrent neural networks

Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemic...

Descripción completa

Detalles Bibliográficos
Autores principales: Corbett, Peter, Boyle, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755713/
https://www.ncbi.nlm.nih.gov/pubmed/30523437
http://dx.doi.org/10.1186/s13321-018-0313-8
_version_ 1783453288866250752
author Corbett, Peter
Boyle, John
author_facet Corbett, Peter
Boyle, John
author_sort Corbett, Peter
collection PubMed
description Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks—a type of recurrent neural net. The second system eschews the rich feature set—and even tokenisation—in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).
format Online
Article
Text
id pubmed-6755713
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-67557132019-09-26 Chemlistem: chemical named entity recognition using recurrent neural networks Corbett, Peter Boyle, John J Cheminform Research Article Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks—a type of recurrent neural net. The second system eschews the rich feature set—and even tokenisation—in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%). Springer International Publishing 2018-12-06 /pmc/articles/PMC6755713/ /pubmed/30523437 http://dx.doi.org/10.1186/s13321-018-0313-8 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Corbett, Peter
Boyle, John
Chemlistem: chemical named entity recognition using recurrent neural networks
title Chemlistem: chemical named entity recognition using recurrent neural networks
title_full Chemlistem: chemical named entity recognition using recurrent neural networks
title_fullStr Chemlistem: chemical named entity recognition using recurrent neural networks
title_full_unstemmed Chemlistem: chemical named entity recognition using recurrent neural networks
title_short Chemlistem: chemical named entity recognition using recurrent neural networks
title_sort chemlistem: chemical named entity recognition using recurrent neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755713/
https://www.ncbi.nlm.nih.gov/pubmed/30523437
http://dx.doi.org/10.1186/s13321-018-0313-8
work_keys_str_mv AT corbettpeter chemlistemchemicalnamedentityrecognitionusingrecurrentneuralnetworks
AT boylejohn chemlistemchemicalnamedentityrecognitionusingrecurrentneuralnetworks