Cargando…

Improving chemical entity recognition through h-index based semantic similarity

BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is t...

Descripción completa

Detalles Bibliográficos
Autores principales: Lamurias, Andre, Ferreira, João D, Couto, Francisco M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331689/
https://www.ncbi.nlm.nih.gov/pubmed/25810770
http://dx.doi.org/10.1186/1758-2946-7-S1-S13
_version_ 1782357759254069248
author Lamurias, Andre
Ferreira, João D
Couto, Francisco M
author_facet Lamurias, Andre
Ferreira, João D
Couto, Francisco M
author_sort Lamurias, Andre
collection PubMed
description BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. RESULTS: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. CONCLUSIONS: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.
format Online
Article
Text
id pubmed-4331689
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43316892015-03-25 Improving chemical entity recognition through h-index based semantic similarity Lamurias, Andre Ferreira, João D Couto, Francisco M J Cheminform Research BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. RESULTS: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. CONCLUSIONS: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure. BioMed Central 2015-01-19 /pmc/articles/PMC4331689/ /pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13 Text en Copyright © 2015 Lamurias et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lamurias, Andre
Ferreira, João D
Couto, Francisco M
Improving chemical entity recognition through h-index based semantic similarity
title Improving chemical entity recognition through h-index based semantic similarity
title_full Improving chemical entity recognition through h-index based semantic similarity
title_fullStr Improving chemical entity recognition through h-index based semantic similarity
title_full_unstemmed Improving chemical entity recognition through h-index based semantic similarity
title_short Improving chemical entity recognition through h-index based semantic similarity
title_sort improving chemical entity recognition through h-index based semantic similarity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331689/
https://www.ncbi.nlm.nih.gov/pubmed/25810770
http://dx.doi.org/10.1186/1758-2946-7-S1-S13
work_keys_str_mv AT lamuriasandre improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity
AT ferreirajoaod improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity
AT coutofranciscom improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity