Cargando…
All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
BACKGROUND: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capabil...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586751/ https://www.ncbi.nlm.nih.gov/pubmed/19025688 http://dx.doi.org/10.1186/1471-2105-9-S11-S2 |
_version_ | 1782160908146966528 |
---|---|
author | Airola, Antti Pyysalo, Sampo Björne, Jari Pahikkala, Tapio Ginter, Filip Salakoski, Tapio |
author_facet | Airola, Antti Pyysalo, Sampo Björne, Jari Pahikkala, Tapio Ginter, Filip Salakoski, Tapio |
author_sort | Airola, Antti |
collection | PubMed |
description | BACKGROUND: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. RESULTS: We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. CONCLUSION: We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided. |
format | Text |
id | pubmed-2586751 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25867512008-11-26 All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning Airola, Antti Pyysalo, Sampo Björne, Jari Pahikkala, Tapio Ginter, Filip Salakoski, Tapio BMC Bioinformatics Research BACKGROUND: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. RESULTS: We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. CONCLUSION: We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided. BioMed Central 2008-11-19 /pmc/articles/PMC2586751/ /pubmed/19025688 http://dx.doi.org/10.1186/1471-2105-9-S11-S2 Text en Copyright © 2008 Airola et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Airola, Antti Pyysalo, Sampo Björne, Jari Pahikkala, Tapio Ginter, Filip Salakoski, Tapio All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
title | All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
title_full | All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
title_fullStr | All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
title_full_unstemmed | All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
title_short | All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
title_sort | all-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586751/ https://www.ncbi.nlm.nih.gov/pubmed/19025688 http://dx.doi.org/10.1186/1471-2105-9-S11-S2 |
work_keys_str_mv | AT airolaantti allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning AT pyysalosampo allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning AT bjornejari allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning AT pahikkalatapio allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning AT ginterfilip allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning AT salakoskitapio allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning |