Cargando…

An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences

Protein–Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Lei, You, Zhu-Hong, Chen, Xing, Li, Jian-Qiang, Yan, Xin, Zhang, Wei, Huang, Yu-An
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Impact Journals LLC 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5354898/
https://www.ncbi.nlm.nih.gov/pubmed/28029645
http://dx.doi.org/10.18632/oncotarget.14103
_version_ 1782515424264454144
author Wang, Lei
You, Zhu-Hong
Chen, Xing
Li, Jian-Qiang
Yan, Xin
Zhang, Wei
Huang, Yu-An
author_facet Wang, Lei
You, Zhu-Hong
Chen, Xing
Li, Jian-Qiang
Yan, Xin
Zhang, Wei
Huang, Yu-An
author_sort Wang, Lei
collection PubMed
description Protein–Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.
format Online
Article
Text
id pubmed-5354898
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Impact Journals LLC
record_format MEDLINE/PubMed
spelling pubmed-53548982017-04-24 An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences Wang, Lei You, Zhu-Hong Chen, Xing Li, Jian-Qiang Yan, Xin Zhang, Wei Huang, Yu-An Oncotarget Research Paper Protein–Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use. Impact Journals LLC 2016-12-22 /pmc/articles/PMC5354898/ /pubmed/28029645 http://dx.doi.org/10.18632/oncotarget.14103 Text en Copyright: © 2017 Wang et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Paper
Wang, Lei
You, Zhu-Hong
Chen, Xing
Li, Jian-Qiang
Yan, Xin
Zhang, Wei
Huang, Yu-An
An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
title An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
title_full An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
title_fullStr An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
title_full_unstemmed An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
title_short An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
title_sort ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5354898/
https://www.ncbi.nlm.nih.gov/pubmed/28029645
http://dx.doi.org/10.18632/oncotarget.14103
work_keys_str_mv AT wanglei anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT youzhuhong anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT chenxing anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT lijianqiang anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT yanxin anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT zhangwei anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT huangyuan anensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT wanglei ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT youzhuhong ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT chenxing ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT lijianqiang ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT yanxin ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT zhangwei ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences
AT huangyuan ensembleapproachforlargescaleidentificationofproteinproteininteractionsusingthealignmentsofmultiplesequences