Cargando…
Improving chemical entity recognition through h-index based semantic similarity
BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331689/ https://www.ncbi.nlm.nih.gov/pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13 |
_version_ | 1782357759254069248 |
---|---|
author | Lamurias, Andre Ferreira, João D Couto, Francisco M |
author_facet | Lamurias, Andre Ferreira, João D Couto, Francisco M |
author_sort | Lamurias, Andre |
collection | PubMed |
description | BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. RESULTS: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. CONCLUSIONS: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure. |
format | Online Article Text |
id | pubmed-4331689 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43316892015-03-25 Improving chemical entity recognition through h-index based semantic similarity Lamurias, Andre Ferreira, João D Couto, Francisco M J Cheminform Research BACKGROUND: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. RESULTS: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. CONCLUSIONS: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure. BioMed Central 2015-01-19 /pmc/articles/PMC4331689/ /pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13 Text en Copyright © 2015 Lamurias et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Lamurias, Andre Ferreira, João D Couto, Francisco M Improving chemical entity recognition through h-index based semantic similarity |
title | Improving chemical entity recognition through h-index based semantic similarity |
title_full | Improving chemical entity recognition through h-index based semantic similarity |
title_fullStr | Improving chemical entity recognition through h-index based semantic similarity |
title_full_unstemmed | Improving chemical entity recognition through h-index based semantic similarity |
title_short | Improving chemical entity recognition through h-index based semantic similarity |
title_sort | improving chemical entity recognition through h-index based semantic similarity |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331689/ https://www.ncbi.nlm.nih.gov/pubmed/25810770 http://dx.doi.org/10.1186/1758-2946-7-S1-S13 |
work_keys_str_mv | AT lamuriasandre improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity AT ferreirajoaod improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity AT coutofranciscom improvingchemicalentityrecognitionthroughhindexbasedsemanticsimilarity |