Cargando…

Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks

Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, w...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Buzhou, Cao, Hongxin, Wang, Xiaolong, Chen, Qingcai, Xu, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3963372/
https://www.ncbi.nlm.nih.gov/pubmed/24729964
http://dx.doi.org/10.1155/2014/240403
_version_ 1782308504158076928
author Tang, Buzhou
Cao, Hongxin
Wang, Xiaolong
Chen, Qingcai
Xu, Hua
author_facet Tang, Buzhou
Cao, Hongxin
Wang, Xiaolong
Chen, Qingcai
Xu, Hua
author_sort Tang, Buzhou
collection PubMed
description Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements in F-measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks.
format Online
Article
Text
id pubmed-3963372
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-39633722014-04-13 Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks Tang, Buzhou Cao, Hongxin Wang, Xiaolong Chen, Qingcai Xu, Hua Biomed Res Int Research Article Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements in F-measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks. Hindawi Publishing Corporation 2014 2014-03-06 /pmc/articles/PMC3963372/ /pubmed/24729964 http://dx.doi.org/10.1155/2014/240403 Text en Copyright © 2014 Buzhou Tang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tang, Buzhou
Cao, Hongxin
Wang, Xiaolong
Chen, Qingcai
Xu, Hua
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
title Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
title_full Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
title_fullStr Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
title_full_unstemmed Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
title_short Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
title_sort evaluating word representation features in biomedical named entity recognition tasks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3963372/
https://www.ncbi.nlm.nih.gov/pubmed/24729964
http://dx.doi.org/10.1155/2014/240403
work_keys_str_mv AT tangbuzhou evaluatingwordrepresentationfeaturesinbiomedicalnamedentityrecognitiontasks
AT caohongxin evaluatingwordrepresentationfeaturesinbiomedicalnamedentityrecognitiontasks
AT wangxiaolong evaluatingwordrepresentationfeaturesinbiomedicalnamedentityrecognitiontasks
AT chenqingcai evaluatingwordrepresentationfeaturesinbiomedicalnamedentityrecognitiontasks
AT xuhua evaluatingwordrepresentationfeaturesinbiomedicalnamedentityrecognitiontasks