Cargando…

Biomedical named entity extraction: some issues of corpus compatibilities

BACKGROUND: Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general co...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ekbal, Asif, Saha, Sriparna, Sikdar, Utpal Kumar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3837077/ https://www.ncbi.nlm.nih.gov/pubmed/24294548 http://dx.doi.org/10.1186/2193-1801-2-601

_version_	1782292401764696064
author	Ekbal, Asif Saha, Sriparna Sikdar, Utpal Kumar
author_facet	Ekbal, Asif Saha, Sriparna Sikdar, Utpal Kumar
author_sort	Ekbal, Asif
collection	PubMed
description	BACKGROUND: Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources. RESULTS: In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task. CONCLUSIONS: In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection “Features for named entity extraction”; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.
format	Online Article Text
id	pubmed-3837077
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-38370772013-11-29 Biomedical named entity extraction: some issues of corpus compatibilities Ekbal, Asif Saha, Sriparna Sikdar, Utpal Kumar Springerplus Research BACKGROUND: Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources. RESULTS: In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task. CONCLUSIONS: In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection “Features for named entity extraction”; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers. Springer International Publishing 2013-11-12 /pmc/articles/PMC3837077/ /pubmed/24294548 http://dx.doi.org/10.1186/2193-1801-2-601 Text en © Ekbal et al.; licensee Springer. 2013 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Ekbal, Asif Saha, Sriparna Sikdar, Utpal Kumar Biomedical named entity extraction: some issues of corpus compatibilities
title	Biomedical named entity extraction: some issues of corpus compatibilities
title_full	Biomedical named entity extraction: some issues of corpus compatibilities
title_fullStr	Biomedical named entity extraction: some issues of corpus compatibilities
title_full_unstemmed	Biomedical named entity extraction: some issues of corpus compatibilities
title_short	Biomedical named entity extraction: some issues of corpus compatibilities
title_sort	biomedical named entity extraction: some issues of corpus compatibilities
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3837077/ https://www.ncbi.nlm.nih.gov/pubmed/24294548 http://dx.doi.org/10.1186/2193-1801-2-601
work_keys_str_mv	AT ekbalasif biomedicalnamedentityextractionsomeissuesofcorpuscompatibilities AT sahasriparna biomedicalnamedentityextractionsomeissuesofcorpuscompatibilities AT sikdarutpalkumar biomedicalnamedentityextractionsomeissuesofcorpuscompatibilities

Biomedical named entity extraction: some issues of corpus compatibilities

Ejemplares similares