Cargando…

Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation

BACKGROUND: The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task. RESUL...

Descripción completa

Detalles Bibliográficos
Autores principales: Pahikkala, Tapio, Ginter, Filip, Boberg, Jorma, Järvinen, Jouni, Salakoski, Tapio
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1180820/
https://www.ncbi.nlm.nih.gov/pubmed/15972097
http://dx.doi.org/10.1186/1471-2105-6-157
_version_ 1782124615388102656
author Pahikkala, Tapio
Ginter, Filip
Boberg, Jorma
Järvinen, Jouni
Salakoski, Tapio
author_facet Pahikkala, Tapio
Ginter, Filip
Boberg, Jorma
Järvinen, Jouni
Salakoski, Tapio
author_sort Pahikkala, Tapio
collection PubMed
description BACKGROUND: The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task. RESULTS: We incorporated into the conventional SVM a weighting scheme based on distances of context words from the word to be disambiguated. This weighting scheme increased the performance of SVMs by five percentage points giving performance better than 85% as measured by the area under ROC curve and outperformed the Weighted Additive Classifier, which also incorporates the weighting, and the Naive Bayes classifier. CONCLUSION: We show that the performance of SVMs can be improved by the proposed weighting scheme. Furthermore, our results suggest that in this study the increase of the classification performance due to the weighting is greater than that obtained by selecting the underlying classifier or the kernel part of the SVM.
format Text
id pubmed-1180820
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-11808202005-07-28 Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation Pahikkala, Tapio Ginter, Filip Boberg, Jorma Järvinen, Jouni Salakoski, Tapio BMC Bioinformatics Research Article BACKGROUND: The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task. RESULTS: We incorporated into the conventional SVM a weighting scheme based on distances of context words from the word to be disambiguated. This weighting scheme increased the performance of SVMs by five percentage points giving performance better than 85% as measured by the area under ROC curve and outperformed the Weighted Additive Classifier, which also incorporates the weighting, and the Naive Bayes classifier. CONCLUSION: We show that the performance of SVMs can be improved by the proposed weighting scheme. Furthermore, our results suggest that in this study the increase of the classification performance due to the weighting is greater than that obtained by selecting the underlying classifier or the kernel part of the SVM. BioMed Central 2005-06-22 /pmc/articles/PMC1180820/ /pubmed/15972097 http://dx.doi.org/10.1186/1471-2105-6-157 Text en Copyright © 2005 Pahikkala et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pahikkala, Tapio
Ginter, Filip
Boberg, Jorma
Järvinen, Jouni
Salakoski, Tapio
Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation
title Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation
title_full Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation
title_fullStr Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation
title_full_unstemmed Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation
title_short Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation
title_sort contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1180820/
https://www.ncbi.nlm.nih.gov/pubmed/15972097
http://dx.doi.org/10.1186/1471-2105-6-157
work_keys_str_mv AT pahikkalatapio contextualweightingforsupportvectormachinesinliteraturemininganapplicationtogeneversusproteinnamedisambiguation
AT ginterfilip contextualweightingforsupportvectormachinesinliteraturemininganapplicationtogeneversusproteinnamedisambiguation
AT bobergjorma contextualweightingforsupportvectormachinesinliteraturemininganapplicationtogeneversusproteinnamedisambiguation
AT jarvinenjouni contextualweightingforsupportvectormachinesinliteraturemininganapplicationtogeneversusproteinnamedisambiguation
AT salakoskitapio contextualweightingforsupportvectormachinesinliteraturemininganapplicationtogeneversusproteinnamedisambiguation