Cargando…
Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature
BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but the...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7507273/ https://www.ncbi.nlm.nih.gov/pubmed/33430995 http://dx.doi.org/10.1186/s13321-020-00461-4 |
_version_ | 1783585194742120448 |
---|---|
author | Ruas, Pedro Lamurias, Andre Couto, Francisco M. |
author_facet | Ruas, Pedro Lamurias, Andre Couto, Francisco M. |
author_sort | Ruas, Pedro |
collection | PubMed |
description | BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. FINDINGS: This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. CONCLUSIONS: We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available. |
format | Online Article Text |
id | pubmed-7507273 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-75072732020-09-23 Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature Ruas, Pedro Lamurias, Andre Couto, Francisco M. J Cheminform Research Article BACKGROUND: Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. FINDINGS: This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. CONCLUSIONS: We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available. Springer International Publishing 2020-09-21 /pmc/articles/PMC7507273/ /pubmed/33430995 http://dx.doi.org/10.1186/s13321-020-00461-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Ruas, Pedro Lamurias, Andre Couto, Francisco M. Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature |
title | Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature |
title_full | Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature |
title_fullStr | Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature |
title_full_unstemmed | Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature |
title_short | Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature |
title_sort | linking chemical and disease entities to ontologies by integrating pagerank with extracted relations from literature |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7507273/ https://www.ncbi.nlm.nih.gov/pubmed/33430995 http://dx.doi.org/10.1186/s13321-020-00461-4 |
work_keys_str_mv | AT ruaspedro linkingchemicalanddiseaseentitiestoontologiesbyintegratingpagerankwithextractedrelationsfromliterature AT lamuriasandre linkingchemicalanddiseaseentitiestoontologiesbyintegratingpagerankwithextractedrelationsfromliterature AT coutofranciscom linkingchemicalanddiseaseentitiestoontologiesbyintegratingpagerankwithextractedrelationsfromliterature |