Cargando…

Gimli: open source and high-performance biomedical name recognition

BACKGROUND: Automatic recognition of biomedical names is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. In recent years, various solutions have been implemented to tackle this problem. However, limitations regarding system characteristics,...

Descripción completa

Detalles Bibliográficos
Autores principales: Campos, David, Matos, Sérgio, Oliveira, José Luís
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651325/
https://www.ncbi.nlm.nih.gov/pubmed/23413997
http://dx.doi.org/10.1186/1471-2105-14-54
_version_ 1782269205460025344
author Campos, David
Matos, Sérgio
Oliveira, José Luís
author_facet Campos, David
Matos, Sérgio
Oliveira, José Luís
author_sort Campos, David
collection PubMed
description BACKGROUND: Automatic recognition of biomedical names is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. In recent years, various solutions have been implemented to tackle this problem. However, limitations regarding system characteristics, customization and usability still hinder their wider application outside text mining research. RESULTS: We present Gimli, an open-source, state-of-the-art tool for automatic recognition of biomedical names. Gimli includes an extended set of implemented and user-selectable features, such as orthographic, morphological, linguistic-based, conjunctions and dictionary-based. A simple and fast method to combine different trained models is also provided. Gimli achieves an F-measure of 87.17% on GENETAG and 72.23% on JNLPBA corpus, significantly outperforming existing open-source solutions. CONCLUSIONS: Gimli is an off-the-shelf, ready to use tool for named-entity recognition, providing trained and optimized models for recognition of biomedical entities from scientific text. It can be used as a command line tool, offering full functionality, including training of new models and customization of the feature set and model parameters through a configuration file. Advanced users can integrate Gimli in their text mining workflows through the provided library, and extend or adapt its functionalities. Based on the underlying system characteristics and functionality, both for final users and developers, and on the reported performance results, we believe that Gimli is a state-of-the-art solution for biomedical NER, contributing to faster and better research in the field. Gimli is freely available at http://bioinformatics.ua.pt/gimli.
format Online
Article
Text
id pubmed-3651325
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36513252013-05-14 Gimli: open source and high-performance biomedical name recognition Campos, David Matos, Sérgio Oliveira, José Luís BMC Bioinformatics Software BACKGROUND: Automatic recognition of biomedical names is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. In recent years, various solutions have been implemented to tackle this problem. However, limitations regarding system characteristics, customization and usability still hinder their wider application outside text mining research. RESULTS: We present Gimli, an open-source, state-of-the-art tool for automatic recognition of biomedical names. Gimli includes an extended set of implemented and user-selectable features, such as orthographic, morphological, linguistic-based, conjunctions and dictionary-based. A simple and fast method to combine different trained models is also provided. Gimli achieves an F-measure of 87.17% on GENETAG and 72.23% on JNLPBA corpus, significantly outperforming existing open-source solutions. CONCLUSIONS: Gimli is an off-the-shelf, ready to use tool for named-entity recognition, providing trained and optimized models for recognition of biomedical entities from scientific text. It can be used as a command line tool, offering full functionality, including training of new models and customization of the feature set and model parameters through a configuration file. Advanced users can integrate Gimli in their text mining workflows through the provided library, and extend or adapt its functionalities. Based on the underlying system characteristics and functionality, both for final users and developers, and on the reported performance results, we believe that Gimli is a state-of-the-art solution for biomedical NER, contributing to faster and better research in the field. Gimli is freely available at http://bioinformatics.ua.pt/gimli. BioMed Central 2013-02-15 /pmc/articles/PMC3651325/ /pubmed/23413997 http://dx.doi.org/10.1186/1471-2105-14-54 Text en Copyright © 2013 Campos et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Campos, David
Matos, Sérgio
Oliveira, José Luís
Gimli: open source and high-performance biomedical name recognition
title Gimli: open source and high-performance biomedical name recognition
title_full Gimli: open source and high-performance biomedical name recognition
title_fullStr Gimli: open source and high-performance biomedical name recognition
title_full_unstemmed Gimli: open source and high-performance biomedical name recognition
title_short Gimli: open source and high-performance biomedical name recognition
title_sort gimli: open source and high-performance biomedical name recognition
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651325/
https://www.ncbi.nlm.nih.gov/pubmed/23413997
http://dx.doi.org/10.1186/1471-2105-14-54
work_keys_str_mv AT camposdavid gimliopensourceandhighperformancebiomedicalnamerecognition
AT matossergio gimliopensourceandhighperformancebiomedicalnamerecognition
AT oliveirajoseluis gimliopensourceandhighperformancebiomedicalnamerecognition