Cargando…

Improving chemical entity recognition through h-index based semantic similarity

BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lamurias, Andre, Ferreira, João D, Couto, Francisco M
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331689/ https://www.ncbi.nlm.nih.gov/pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13

_version_	1782357759254069248
author	Lamurias, Andre Ferreira, João D Couto, Francisco M
author_facet	Lamurias, Andre Ferreira, João D Couto, Francisco M
author_sort	Lamurias, Andre
collection	PubMed
description	BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. RESULTS: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. CONCLUSIONS: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.
format	Online Article Text
id	pubmed-4331689
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43316892015-03-25 Improving chemical entity recognition through h-index based semantic similarity Lamurias, Andre Ferreira, João D Couto, Francisco M J Cheminform Research BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. RESULTS: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. CONCLUSIONS: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure. BioMed Central 2015-01-19 /pmc/articles/PMC4331689/ /pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13 Text en Copyright © 2015 Lamurias et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Lamurias, Andre Ferreira, João D Couto, Francisco M Improving chemical entity recognition through h-index based semantic similarity
title	Improving chemical entity recognition through h-index based semantic similarity
title_full	Improving chemical entity recognition through h-index based semantic similarity
title_fullStr	Improving chemical entity recognition through h-index based semantic similarity
title_full_unstemmed	Improving chemical entity recognition through h-index based semantic similarity
title_short	Improving chemical entity recognition through h-index based semantic similarity
title_sort	improving chemical entity recognition through h-index based semantic similarity
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331689/ https://www.ncbi.nlm.nih.gov/pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13
work_keys_str_mv	AT lamuriasandre improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity AT ferreirajoaod improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity AT coutofranciscom improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity

Improving chemical entity recognition through h-index based semantic similarity

Ejemplares similares