Cargando…

A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been propo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tikk, Domonkos, Thomas, Philippe, Palaga, Peter, Hakenberg, Jörg, Leser, Ulf
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2895635/ https://www.ncbi.nlm.nih.gov/pubmed/20617200 http://dx.doi.org/10.1371/journal.pcbi.1000837

_version_	1782183273717301248
author	Tikk, Domonkos Thomas, Philippe Palaga, Peter Hakenberg, Jörg Leser, Ulf
author_facet	Tikk, Domonkos Thomas, Philippe Palaga, Peter Hakenberg, Jörg Leser, Ulf
author_sort	Tikk, Domonkos
collection	PubMed
description	The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.
format	Text
id	pubmed-2895635
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-28956352010-07-08 A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature Tikk, Domonkos Thomas, Philippe Palaga, Peter Hakenberg, Jörg Leser, Ulf PLoS Comput Biol Research Article The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods. Public Library of Science 2010-07-01 /pmc/articles/PMC2895635/ /pubmed/20617200 http://dx.doi.org/10.1371/journal.pcbi.1000837 Text en Tikk et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Tikk, Domonkos Thomas, Philippe Palaga, Peter Hakenberg, Jörg Leser, Ulf A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
title	A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
title_full	A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
title_fullStr	A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
title_full_unstemmed	A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
title_short	A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
title_sort	comprehensive benchmark of kernel methods to extract protein–protein interactions from literature
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2895635/ https://www.ncbi.nlm.nih.gov/pubmed/20617200 http://dx.doi.org/10.1371/journal.pcbi.1000837
work_keys_str_mv	AT tikkdomonkos acomprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT thomasphilippe acomprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT palagapeter acomprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT hakenbergjorg acomprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT leserulf acomprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT tikkdomonkos comprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT thomasphilippe comprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT palagapeter comprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT hakenbergjorg comprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature AT leserulf comprehensivebenchmarkofkernelmethodstoextractproteinproteininteractionsfromliterature

A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

Ejemplares similares