Cargando…

MER: a shell script and annotation server for minimal named entity recognition and linking

Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Couto, Francisco M., Lamurias, Andre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755715/
https://www.ncbi.nlm.nih.gov/pubmed/30519990
http://dx.doi.org/10.1186/s13321-018-0312-9
_version_ 1783453289325527040
author Couto, Francisco M.
Lamurias, Andre
author_facet Couto, Francisco M.
Lamurias, Andre
author_sort Couto, Francisco M.
collection PubMed
description Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires: (1) a lexicon (text file) with the list of terms representing the entities of interest; (2) optionally a tab-separated values file with a link for each term; (3) and a Unix shell. Alternatively, the user can provide an ontology from where MER will automatically generate the lexicon and links files. The efficiency of MER derives from exploring the high performance and reliability of the text processing command-line tools grep and awk, and a novel inverted recognition technique. MER was deployed in a cloud infrastructure using multiple Virtual Machines to work as an annotation server and participate in the Technical Interoperability and Performance of annotation Servers task of BioCreative V.5. The results show that our solution processed each document (text retrieval and annotation) in less than 3 s on average without using any type of cache. MER was also compared to a state-of-the-art dictionary lookup solution obtaining competitive results not only in computational performance but also in precision and recall. MER is publicly available in a GitHub repository (https://github.com/lasigeBioTM/MER) and through a RESTful Web service (http://labs.fc.ul.pt/mer/).
format Online
Article
Text
id pubmed-6755715
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-67557152019-09-26 MER: a shell script and annotation server for minimal named entity recognition and linking Couto, Francisco M. Lamurias, Andre J Cheminform Research Article Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires: (1) a lexicon (text file) with the list of terms representing the entities of interest; (2) optionally a tab-separated values file with a link for each term; (3) and a Unix shell. Alternatively, the user can provide an ontology from where MER will automatically generate the lexicon and links files. The efficiency of MER derives from exploring the high performance and reliability of the text processing command-line tools grep and awk, and a novel inverted recognition technique. MER was deployed in a cloud infrastructure using multiple Virtual Machines to work as an annotation server and participate in the Technical Interoperability and Performance of annotation Servers task of BioCreative V.5. The results show that our solution processed each document (text retrieval and annotation) in less than 3 s on average without using any type of cache. MER was also compared to a state-of-the-art dictionary lookup solution obtaining competitive results not only in computational performance but also in precision and recall. MER is publicly available in a GitHub repository (https://github.com/lasigeBioTM/MER) and through a RESTful Web service (http://labs.fc.ul.pt/mer/). Springer International Publishing 2018-12-05 /pmc/articles/PMC6755715/ /pubmed/30519990 http://dx.doi.org/10.1186/s13321-018-0312-9 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Couto, Francisco M.
Lamurias, Andre
MER: a shell script and annotation server for minimal named entity recognition and linking
title MER: a shell script and annotation server for minimal named entity recognition and linking
title_full MER: a shell script and annotation server for minimal named entity recognition and linking
title_fullStr MER: a shell script and annotation server for minimal named entity recognition and linking
title_full_unstemmed MER: a shell script and annotation server for minimal named entity recognition and linking
title_short MER: a shell script and annotation server for minimal named entity recognition and linking
title_sort mer: a shell script and annotation server for minimal named entity recognition and linking
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6755715/
https://www.ncbi.nlm.nih.gov/pubmed/30519990
http://dx.doi.org/10.1186/s13321-018-0312-9
work_keys_str_mv AT coutofranciscom merashellscriptandannotationserverforminimalnamedentityrecognitionandlinking
AT lamuriasandre merashellscriptandannotationserverforminimalnamedentityrecognitionandlinking