Cargando…

Developing a hybrid dictionary-based bio-entity recognition technique

BACKGROUND: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. METHODS: This paper presents a hybrid dictionary-based bio-entity extractio...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Min, Yu, Hwanjo, Han, Wook-Shin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460617/
https://www.ncbi.nlm.nih.gov/pubmed/26043907
http://dx.doi.org/10.1186/1472-6947-15-S1-S9
_version_ 1782375398644908032
author Song, Min
Yu, Hwanjo
Han, Wook-Shin
author_facet Song, Min
Yu, Hwanjo
Han, Wook-Shin
author_sort Song, Min
collection PubMed
description BACKGROUND: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. METHODS: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. RESULTS: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. CONCLUSIONS: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.
format Online
Article
Text
id pubmed-4460617
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44606172015-06-29 Developing a hybrid dictionary-based bio-entity recognition technique Song, Min Yu, Hwanjo Han, Wook-Shin BMC Med Inform Decis Mak Research Article BACKGROUND: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. METHODS: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. RESULTS: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. CONCLUSIONS: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall. BioMed Central 2015-05-20 /pmc/articles/PMC4460617/ /pubmed/26043907 http://dx.doi.org/10.1186/1472-6947-15-S1-S9 Text en Copyright © 2015 Song et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Song, Min
Yu, Hwanjo
Han, Wook-Shin
Developing a hybrid dictionary-based bio-entity recognition technique
title Developing a hybrid dictionary-based bio-entity recognition technique
title_full Developing a hybrid dictionary-based bio-entity recognition technique
title_fullStr Developing a hybrid dictionary-based bio-entity recognition technique
title_full_unstemmed Developing a hybrid dictionary-based bio-entity recognition technique
title_short Developing a hybrid dictionary-based bio-entity recognition technique
title_sort developing a hybrid dictionary-based bio-entity recognition technique
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460617/
https://www.ncbi.nlm.nih.gov/pubmed/26043907
http://dx.doi.org/10.1186/1472-6947-15-S1-S9
work_keys_str_mv AT songmin developingahybriddictionarybasedbioentityrecognitiontechnique
AT yuhwanjo developingahybriddictionarybasedbioentityrecognitiontechnique
AT hanwookshin developingahybriddictionarybasedbioentityrecognitiontechnique