Cargando…

Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling

We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieve...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaewphan, Suwisa, Hakala, Kai, Miekka, Niko, Salakoski, Tapio, Ginter, Filip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146133/
https://www.ncbi.nlm.nih.gov/pubmed/30239666
http://dx.doi.org/10.1093/database/bay096
_version_ 1783356347386953728
author Kaewphan, Suwisa
Hakala, Kai
Miekka, Niko
Salakoski, Tapio
Ginter, Filip
author_facet Kaewphan, Suwisa
Hakala, Kai
Miekka, Niko
Salakoski, Tapio
Ginter, Filip
author_sort Kaewphan, Suwisa
collection PubMed
description We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character n-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license.
format Online
Article
Text
id pubmed-6146133
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61461332018-09-25 Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling Kaewphan, Suwisa Hakala, Kai Miekka, Niko Salakoski, Tapio Ginter, Filip Database (Oxford) Original Article We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character n-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license. Oxford University Press 2018-09-18 /pmc/articles/PMC6146133/ /pubmed/30239666 http://dx.doi.org/10.1093/database/bay096 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Kaewphan, Suwisa
Hakala, Kai
Miekka, Niko
Salakoski, Tapio
Ginter, Filip
Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
title Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
title_full Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
title_fullStr Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
title_full_unstemmed Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
title_short Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
title_sort wide-scope biomedical named entity recognition and normalization with crfs, fuzzy matching and character level modeling
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146133/
https://www.ncbi.nlm.nih.gov/pubmed/30239666
http://dx.doi.org/10.1093/database/bay096
work_keys_str_mv AT kaewphansuwisa widescopebiomedicalnamedentityrecognitionandnormalizationwithcrfsfuzzymatchingandcharacterlevelmodeling
AT hakalakai widescopebiomedicalnamedentityrecognitionandnormalizationwithcrfsfuzzymatchingandcharacterlevelmodeling
AT miekkaniko widescopebiomedicalnamedentityrecognitionandnormalizationwithcrfsfuzzymatchingandcharacterlevelmodeling
AT salakoskitapio widescopebiomedicalnamedentityrecognitionandnormalizationwithcrfsfuzzymatchingandcharacterlevelmodeling
AT ginterfilip widescopebiomedicalnamedentityrecognitionandnormalizationwithcrfsfuzzymatchingandcharacterlevelmodeling