Cargando…

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text

MOTIVATION: Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Qile, Li, Xiaolin, Conesa, Ana, Pereira, Cécile
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5925775/
https://www.ncbi.nlm.nih.gov/pubmed/29272325
http://dx.doi.org/10.1093/bioinformatics/btx815
_version_ 1783318771780288512
author Zhu, Qile
Li, Xiaolin
Conesa, Ana
Pereira, Cécile
author_facet Zhu, Qile
Li, Xiaolin
Conesa, Ana
Pereira, Cécile
author_sort Zhu, Qile
collection PubMed
description MOTIVATION: Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. RESULTS: We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. AVAILABILITY AND IMPLEMENTATION: The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5925775
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59257752018-05-04 GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text Zhu, Qile Li, Xiaolin Conesa, Ana Pereira, Cécile Bioinformatics Original Papers MOTIVATION: Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. RESULTS: We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. AVAILABILITY AND IMPLEMENTATION: The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-05-01 2017-12-20 /pmc/articles/PMC5925775/ /pubmed/29272325 http://dx.doi.org/10.1093/bioinformatics/btx815 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Zhu, Qile
Li, Xiaolin
Conesa, Ana
Pereira, Cécile
GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text
title GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text
title_full GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text
title_fullStr GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text
title_full_unstemmed GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text
title_short GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text
title_sort gram-cnn: a deep learning approach with local context for named entity recognition in biomedical text
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5925775/
https://www.ncbi.nlm.nih.gov/pubmed/29272325
http://dx.doi.org/10.1093/bioinformatics/btx815
work_keys_str_mv AT zhuqile gramcnnadeeplearningapproachwithlocalcontextfornamedentityrecognitioninbiomedicaltext
AT lixiaolin gramcnnadeeplearningapproachwithlocalcontextfornamedentityrecognitioninbiomedicaltext
AT conesaana gramcnnadeeplearningapproachwithlocalcontextfornamedentityrecognitioninbiomedicaltext
AT pereiracecile gramcnnadeeplearningapproachwithlocalcontextfornamedentityrecognitioninbiomedicaltext