Cargando…

Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices

BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix store...

Descripción completa

Detalles Bibliográficos
Autores principales: Craig, Roger A, Liao, Li
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1781468/
https://www.ncbi.nlm.nih.gov/pubmed/17212819
http://dx.doi.org/10.1186/1471-2105-8-6
_version_ 1782131887281537024
author Craig, Roger A
Liao, Li
author_facet Craig, Roger A
Liao, Li
author_sort Craig, Roger A
collection PubMed
description BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. RESULTS: We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587). CONCLUSION: We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions.
format Text
id pubmed-1781468
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17814682007-01-30 Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices Craig, Roger A Liao, Li BMC Bioinformatics Research Article BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. RESULTS: We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587). CONCLUSION: We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions. BioMed Central 2007-01-09 /pmc/articles/PMC1781468/ /pubmed/17212819 http://dx.doi.org/10.1186/1471-2105-8-6 Text en Copyright © 2007 Craig and Liao; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Craig, Roger A
Liao, Li
Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_full Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_fullStr Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_full_unstemmed Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_short Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_sort phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1781468/
https://www.ncbi.nlm.nih.gov/pubmed/17212819
http://dx.doi.org/10.1186/1471-2105-8-6
work_keys_str_mv AT craigrogera phylogenetictreeinformationaidssupervisedlearningforpredictingproteinproteininteractionbasedondistancematrices
AT liaoli phylogenetictreeinformationaidssupervisedlearningforpredictingproteinproteininteractionbasedondistancematrices