Cargando…

Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation

With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors th...

Descripción completa

Detalles Bibliográficos
Autores principales: Grego, Tiago, Couto, Francisco M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3642108/
https://www.ncbi.nlm.nih.gov/pubmed/23658791
http://dx.doi.org/10.1371/journal.pone.0062984
_version_ 1782268101693276160
author Grego, Tiago
Couto, Francisco M.
author_facet Grego, Tiago
Couto, Francisco M.
author_sort Grego, Tiago
collection PubMed
description With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/.
format Online
Article
Text
id pubmed-3642108
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36421082013-05-08 Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation Grego, Tiago Couto, Francisco M. PLoS One Research Article With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/. Public Library of Science 2013-05-02 /pmc/articles/PMC3642108/ /pubmed/23658791 http://dx.doi.org/10.1371/journal.pone.0062984 Text en © 2013 Grego, Couto http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Grego, Tiago
Couto, Francisco M.
Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
title Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
title_full Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
title_fullStr Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
title_full_unstemmed Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
title_short Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
title_sort enhancement of chemical entity identification in text using semantic similarity validation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3642108/
https://www.ncbi.nlm.nih.gov/pubmed/23658791
http://dx.doi.org/10.1371/journal.pone.0062984
work_keys_str_mv AT gregotiago enhancementofchemicalentityidentificationintextusingsemanticsimilarityvalidation
AT coutofranciscom enhancementofchemicalentityidentificationintextusingsemanticsimilarityvalidation