Cargando…

Comparison of named entity recognition methodologies in biomedical documents

BACKGROUND: Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is deve...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Hye-Jeong, Jo, Byeong-Cheol, Park, Chan-Young, Kim, Jong-Dae, Kim, Yu-Seop
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219049/
https://www.ncbi.nlm.nih.gov/pubmed/30396340
http://dx.doi.org/10.1186/s12938-018-0573-6
_version_ 1783368573950885888
author Song, Hye-Jeong
Jo, Byeong-Cheol
Park, Chan-Young
Kim, Jong-Dae
Kim, Yu-Seop
author_facet Song, Hye-Jeong
Jo, Byeong-Cheol
Park, Chan-Young
Kim, Jong-Dae
Kim, Yu-Seop
author_sort Song, Hye-Jeong
collection PubMed
description BACKGROUND: Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers. RESULTS: Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively. CONCLUSIONS: By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.
format Online
Article
Text
id pubmed-6219049
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62190492018-11-08 Comparison of named entity recognition methodologies in biomedical documents Song, Hye-Jeong Jo, Byeong-Cheol Park, Chan-Young Kim, Jong-Dae Kim, Yu-Seop Biomed Eng Online Research BACKGROUND: Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers. RESULTS: Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively. CONCLUSIONS: By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved. BioMed Central 2018-11-06 /pmc/articles/PMC6219049/ /pubmed/30396340 http://dx.doi.org/10.1186/s12938-018-0573-6 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Song, Hye-Jeong
Jo, Byeong-Cheol
Park, Chan-Young
Kim, Jong-Dae
Kim, Yu-Seop
Comparison of named entity recognition methodologies in biomedical documents
title Comparison of named entity recognition methodologies in biomedical documents
title_full Comparison of named entity recognition methodologies in biomedical documents
title_fullStr Comparison of named entity recognition methodologies in biomedical documents
title_full_unstemmed Comparison of named entity recognition methodologies in biomedical documents
title_short Comparison of named entity recognition methodologies in biomedical documents
title_sort comparison of named entity recognition methodologies in biomedical documents
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219049/
https://www.ncbi.nlm.nih.gov/pubmed/30396340
http://dx.doi.org/10.1186/s12938-018-0573-6
work_keys_str_mv AT songhyejeong comparisonofnamedentityrecognitionmethodologiesinbiomedicaldocuments
AT jobyeongcheol comparisonofnamedentityrecognitionmethodologiesinbiomedicaldocuments
AT parkchanyoung comparisonofnamedentityrecognitionmethodologiesinbiomedicaldocuments
AT kimjongdae comparisonofnamedentityrecognitionmethodologiesinbiomedicaldocuments
AT kimyuseop comparisonofnamedentityrecognitionmethodologiesinbiomedicaldocuments