Cargando…

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature

BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but the...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruas, Pedro, Lamurias, Andre, Couto, Francisco M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7507273/
https://www.ncbi.nlm.nih.gov/pubmed/33430995
http://dx.doi.org/10.1186/s13321-020-00461-4
_version_ 1783585194742120448
author Ruas, Pedro
Lamurias, Andre
Couto, Francisco M.
author_facet Ruas, Pedro
Lamurias, Andre
Couto, Francisco M.
author_sort Ruas, Pedro
collection PubMed
description BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. FINDINGS: This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. CONCLUSIONS: We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.
format Online
Article
Text
id pubmed-7507273
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-75072732020-09-23 Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature Ruas, Pedro Lamurias, Andre Couto, Francisco M. J Cheminform Research Article BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. FINDINGS: This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. CONCLUSIONS: We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available. Springer International Publishing 2020-09-21 /pmc/articles/PMC7507273/ /pubmed/33430995 http://dx.doi.org/10.1186/s13321-020-00461-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Ruas, Pedro
Lamurias, Andre
Couto, Francisco M.
Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
title Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
title_full Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
title_fullStr Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
title_full_unstemmed Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
title_short Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
title_sort linking chemical and disease entities to ontologies by integrating pagerank with extracted relations from literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7507273/
https://www.ncbi.nlm.nih.gov/pubmed/33430995
http://dx.doi.org/10.1186/s13321-020-00461-4
work_keys_str_mv AT ruaspedro linkingchemicalanddiseaseentitiestoontologiesbyintegratingpagerankwithextractedrelationsfromliterature
AT lamuriasandre linkingchemicalanddiseaseentitiestoontologiesbyintegratingpagerankwithextractedrelationsfromliterature
AT coutofranciscom linkingchemicalanddiseaseentitiestoontologiesbyintegratingpagerankwithextractedrelationsfromliterature