Cargando…

BELB: a biomedical entity linking benchmark

MOTIVATION: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks...

Descripción completa

Detalles Bibliográficos
Autores principales: Garda, Samuele, Weber-Genzel, Leon, Martin, Robert, Leser, Ulf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681865/
https://www.ncbi.nlm.nih.gov/pubmed/37975879
http://dx.doi.org/10.1093/bioinformatics/btad698
_version_ 1785150849111031808
author Garda, Samuele
Weber-Genzel, Leon
Martin, Robert
Leser, Ulf
author_facet Garda, Samuele
Weber-Genzel, Leon
Martin, Robert
Leser, Ulf
author_sort Garda, Samuele
collection PubMed
description MOTIVATION: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. RESULTS: We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models. AVAILABILITY AND IMPLEMENTATION: The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp.
format Online
Article
Text
id pubmed-10681865
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106818652023-11-30 BELB: a biomedical entity linking benchmark Garda, Samuele Weber-Genzel, Leon Martin, Robert Leser, Ulf Bioinformatics Original Paper MOTIVATION: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. RESULTS: We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models. AVAILABILITY AND IMPLEMENTATION: The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp. Oxford University Press 2023-11-17 /pmc/articles/PMC10681865/ /pubmed/37975879 http://dx.doi.org/10.1093/bioinformatics/btad698 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Garda, Samuele
Weber-Genzel, Leon
Martin, Robert
Leser, Ulf
BELB: a biomedical entity linking benchmark
title BELB: a biomedical entity linking benchmark
title_full BELB: a biomedical entity linking benchmark
title_fullStr BELB: a biomedical entity linking benchmark
title_full_unstemmed BELB: a biomedical entity linking benchmark
title_short BELB: a biomedical entity linking benchmark
title_sort belb: a biomedical entity linking benchmark
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681865/
https://www.ncbi.nlm.nih.gov/pubmed/37975879
http://dx.doi.org/10.1093/bioinformatics/btad698
work_keys_str_mv AT gardasamuele belbabiomedicalentitylinkingbenchmark
AT webergenzelleon belbabiomedicalentitylinkingbenchmark
AT martinrobert belbabiomedicalentitylinkingbenchmark
AT leserulf belbabiomedicalentitylinkingbenchmark