Cargando…

A detailed error analysis of 13 kernel methods for protein–protein interaction extraction

BACKGROUND: Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tikk, Domonkos, Solt, Illés, Thomas, Philippe, Leser, Ulf
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680070/ https://www.ncbi.nlm.nih.gov/pubmed/23323857 http://dx.doi.org/10.1186/1471-2105-14-12

_version_	1782273068644696064
author	Tikk, Domonkos Solt, Illés Thomas, Philippe Leser, Ulf
author_facet	Tikk, Domonkos Solt, Illés Thomas, Philippe Leser, Ulf
author_sort	Tikk, Domonkos
collection	PubMed
description	BACKGROUND: Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level. RESULTS: We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance. CONCLUSIONS: Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions.
format	Online Article Text
id	pubmed-3680070
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36800702013-06-25 A detailed error analysis of 13 kernel methods for protein–protein interaction extraction Tikk, Domonkos Solt, Illés Thomas, Philippe Leser, Ulf BMC Bioinformatics Research Article BACKGROUND: Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level. RESULTS: We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance. CONCLUSIONS: Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions. BioMed Central 2013-01-16 /pmc/articles/PMC3680070/ /pubmed/23323857 http://dx.doi.org/10.1186/1471-2105-14-12 Text en Copyright © 2013 Tikk et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Tikk, Domonkos Solt, Illés Thomas, Philippe Leser, Ulf A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
title	A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
title_full	A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
title_fullStr	A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
title_full_unstemmed	A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
title_short	A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
title_sort	detailed error analysis of 13 kernel methods for protein–protein interaction extraction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680070/ https://www.ncbi.nlm.nih.gov/pubmed/23323857 http://dx.doi.org/10.1186/1471-2105-14-12
work_keys_str_mv	AT tikkdomonkos adetailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT soltilles adetailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT thomasphilippe adetailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT leserulf adetailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT tikkdomonkos detailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT soltilles detailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT thomasphilippe detailederroranalysisof13kernelmethodsforproteinproteininteractionextraction AT leserulf detailederroranalysisof13kernelmethodsforproteinproteininteractionextraction

A detailed error analysis of 13 kernel methods for protein–protein interaction extraction

Ejemplares similares