Cargando…
Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix store...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1781468/ https://www.ncbi.nlm.nih.gov/pubmed/17212819 http://dx.doi.org/10.1186/1471-2105-8-6 |
_version_ | 1782131887281537024 |
---|---|
author | Craig, Roger A Liao, Li |
author_facet | Craig, Roger A Liao, Li |
author_sort | Craig, Roger A |
collection | PubMed |
description | BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. RESULTS: We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587). CONCLUSION: We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions. |
format | Text |
id | pubmed-1781468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-17814682007-01-30 Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices Craig, Roger A Liao, Li BMC Bioinformatics Research Article BACKGROUND: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. RESULTS: We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587). CONCLUSION: We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions. BioMed Central 2007-01-09 /pmc/articles/PMC1781468/ /pubmed/17212819 http://dx.doi.org/10.1186/1471-2105-8-6 Text en Copyright © 2007 Craig and Liao; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Craig, Roger A Liao, Li Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
title | Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
title_full | Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
title_fullStr | Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
title_full_unstemmed | Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
title_short | Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
title_sort | phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1781468/ https://www.ncbi.nlm.nih.gov/pubmed/17212819 http://dx.doi.org/10.1186/1471-2105-8-6 |
work_keys_str_mv | AT craigrogera phylogenetictreeinformationaidssupervisedlearningforpredictingproteinproteininteractionbasedondistancematrices AT liaoli phylogenetictreeinformationaidssupervisedlearningforpredictingproteinproteininteractionbasedondistancematrices |