Cargando…
BELB: a biomedical entity linking benchmark
MOTIVATION: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681865/ https://www.ncbi.nlm.nih.gov/pubmed/37975879 http://dx.doi.org/10.1093/bioinformatics/btad698 |
_version_ | 1785150849111031808 |
---|---|
author | Garda, Samuele Weber-Genzel, Leon Martin, Robert Leser, Ulf |
author_facet | Garda, Samuele Weber-Genzel, Leon Martin, Robert Leser, Ulf |
author_sort | Garda, Samuele |
collection | PubMed |
description | MOTIVATION: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. RESULTS: We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models. AVAILABILITY AND IMPLEMENTATION: The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp. |
format | Online Article Text |
id | pubmed-10681865 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-106818652023-11-30 BELB: a biomedical entity linking benchmark Garda, Samuele Weber-Genzel, Leon Martin, Robert Leser, Ulf Bioinformatics Original Paper MOTIVATION: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. RESULTS: We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models. AVAILABILITY AND IMPLEMENTATION: The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp. Oxford University Press 2023-11-17 /pmc/articles/PMC10681865/ /pubmed/37975879 http://dx.doi.org/10.1093/bioinformatics/btad698 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Garda, Samuele Weber-Genzel, Leon Martin, Robert Leser, Ulf BELB: a biomedical entity linking benchmark |
title | BELB: a biomedical entity linking benchmark |
title_full | BELB: a biomedical entity linking benchmark |
title_fullStr | BELB: a biomedical entity linking benchmark |
title_full_unstemmed | BELB: a biomedical entity linking benchmark |
title_short | BELB: a biomedical entity linking benchmark |
title_sort | belb: a biomedical entity linking benchmark |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681865/ https://www.ncbi.nlm.nih.gov/pubmed/37975879 http://dx.doi.org/10.1093/bioinformatics/btad698 |
work_keys_str_mv | AT gardasamuele belbabiomedicalentitylinkingbenchmark AT webergenzelleon belbabiomedicalentitylinkingbenchmark AT martinrobert belbabiomedicalentitylinkingbenchmark AT leserulf belbabiomedicalentitylinkingbenchmark |