Cargando…

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

BACKGROUND: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capabil...

Descripción completa

Detalles Bibliográficos
Autores principales: Airola, Antti, Pyysalo, Sampo, Björne, Jari, Pahikkala, Tapio, Ginter, Filip, Salakoski, Tapio
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586751/
https://www.ncbi.nlm.nih.gov/pubmed/19025688
http://dx.doi.org/10.1186/1471-2105-9-S11-S2
_version_ 1782160908146966528
author Airola, Antti
Pyysalo, Sampo
Björne, Jari
Pahikkala, Tapio
Ginter, Filip
Salakoski, Tapio
author_facet Airola, Antti
Pyysalo, Sampo
Björne, Jari
Pahikkala, Tapio
Ginter, Filip
Salakoski, Tapio
author_sort Airola, Antti
collection PubMed
description BACKGROUND: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. RESULTS: We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. CONCLUSION: We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided.
format Text
id pubmed-2586751
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25867512008-11-26 All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning Airola, Antti Pyysalo, Sampo Björne, Jari Pahikkala, Tapio Ginter, Filip Salakoski, Tapio BMC Bioinformatics Research BACKGROUND: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. RESULTS: We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. CONCLUSION: We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided. BioMed Central 2008-11-19 /pmc/articles/PMC2586751/ /pubmed/19025688 http://dx.doi.org/10.1186/1471-2105-9-S11-S2 Text en Copyright © 2008 Airola et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Airola, Antti
Pyysalo, Sampo
Björne, Jari
Pahikkala, Tapio
Ginter, Filip
Salakoski, Tapio
All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
title All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
title_full All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
title_fullStr All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
title_full_unstemmed All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
title_short All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
title_sort all-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586751/
https://www.ncbi.nlm.nih.gov/pubmed/19025688
http://dx.doi.org/10.1186/1471-2105-9-S11-S2
work_keys_str_mv AT airolaantti allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning
AT pyysalosampo allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning
AT bjornejari allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning
AT pahikkalatapio allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning
AT ginterfilip allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning
AT salakoskitapio allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning