Cargando…

OGER++: hybrid multi-type entity recognition

BACKGROUND: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator us...

Descripción completa

Detalles Bibliográficos
Autores principales:	Furrer, Lenz, Jancso, Anna, Colic, Nicola, Rinaldi, Fabio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6689863/ https://www.ncbi.nlm.nih.gov/pubmed/30666476 http://dx.doi.org/10.1186/s13321-018-0326-3

_version_	1783443102707482624
author	Furrer, Lenz Jancso, Anna Colic, Nicola Rinaldi, Fabio
author_facet	Furrer, Lenz Jancso, Anna Colic, Nicola Rinaldi, Fabio
author_sort	Furrer, Lenz
collection	PubMed
description	BACKGROUND: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. RESULTS: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. CONCLUSIONS: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.
format	Online Article Text
id	pubmed-6689863
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-66898632019-08-15 OGER++: hybrid multi-type entity recognition Furrer, Lenz Jancso, Anna Colic, Nicola Rinaldi, Fabio J Cheminform Research Article BACKGROUND: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. RESULTS: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. CONCLUSIONS: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining. Springer International Publishing 2019-01-21 /pmc/articles/PMC6689863/ /pubmed/30666476 http://dx.doi.org/10.1186/s13321-018-0326-3 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Furrer, Lenz Jancso, Anna Colic, Nicola Rinaldi, Fabio OGER++: hybrid multi-type entity recognition
title	OGER++: hybrid multi-type entity recognition
title_full	OGER++: hybrid multi-type entity recognition
title_fullStr	OGER++: hybrid multi-type entity recognition
title_full_unstemmed	OGER++: hybrid multi-type entity recognition
title_short	OGER++: hybrid multi-type entity recognition
title_sort	oger++: hybrid multi-type entity recognition
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6689863/ https://www.ncbi.nlm.nih.gov/pubmed/30666476 http://dx.doi.org/10.1186/s13321-018-0326-3
work_keys_str_mv	AT furrerlenz ogerhybridmultitypeentityrecognition AT jancsoanna ogerhybridmultitypeentityrecognition AT colicnicola ogerhybridmultitypeentityrecognition AT rinaldifabio ogerhybridmultitypeentityrecognition

OGER++: hybrid multi-type entity recognition

Ejemplares similares