Cargando…

BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition

Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Murugesan, Gurusamy, Abdulkadhar, Sabenabanu, Bhasuran, Balu, Natarajan, Jeyakumar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5419958/ https://www.ncbi.nlm.nih.gov/pubmed/28477208 http://dx.doi.org/10.1186/s13637-017-0060-6

_version_	1783234310086590464
author	Murugesan, Gurusamy Abdulkadhar, Sabenabanu Bhasuran, Balu Natarajan, Jeyakumar
author_facet	Murugesan, Gurusamy Abdulkadhar, Sabenabanu Bhasuran, Balu Natarajan, Jeyakumar
author_sort	Murugesan, Gurusamy
collection	PubMed
description	Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.
format	Online Article Text
id	pubmed-5419958
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-54199582017-05-22 BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition Murugesan, Gurusamy Abdulkadhar, Sabenabanu Bhasuran, Balu Natarajan, Jeyakumar EURASIP J Bioinform Syst Biol Research Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers. Springer International Publishing 2017-05-05 /pmc/articles/PMC5419958/ /pubmed/28477208 http://dx.doi.org/10.1186/s13637-017-0060-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Research Murugesan, Gurusamy Abdulkadhar, Sabenabanu Bhasuran, Balu Natarajan, Jeyakumar BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
title	BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
title_full	BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
title_fullStr	BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
title_full_unstemmed	BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
title_short	BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
title_sort	bcc-ner: bidirectional, contextual clues named entity tagger for gene/protein mention recognition
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5419958/ https://www.ncbi.nlm.nih.gov/pubmed/28477208 http://dx.doi.org/10.1186/s13637-017-0060-6
work_keys_str_mv	AT murugesangurusamy bccnerbidirectionalcontextualcluesnamedentitytaggerforgeneproteinmentionrecognition AT abdulkadharsabenabanu bccnerbidirectionalcontextualcluesnamedentitytaggerforgeneproteinmentionrecognition AT bhasuranbalu bccnerbidirectionalcontextualcluesnamedentitytaggerforgeneproteinmentionrecognition AT natarajanjeyakumar bccnerbidirectionalcontextualcluesnamedentitytaggerforgeneproteinmentionrecognition

BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition

Ejemplares similares