Cargando…

Predicting protein-protein interactions via multivariate mutual information of protein sequences

BACKGROUND: Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Yijie, Tang, Jijun, Guo, Fei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5039908/
https://www.ncbi.nlm.nih.gov/pubmed/27677692
http://dx.doi.org/10.1186/s12859-016-1253-9
_version_ 1782456146698698752
author Ding, Yijie
Tang, Jijun
Guo, Fei
author_facet Ding, Yijie
Tang, Jijun
Guo, Fei
author_sort Ding, Yijie
collection PubMed
description BACKGROUND: Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF). METHODS: Our method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs. RESULTS: To evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network. CONCLUSIONS: Compared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies.
format Online
Article
Text
id pubmed-5039908
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50399082016-10-05 Predicting protein-protein interactions via multivariate mutual information of protein sequences Ding, Yijie Tang, Jijun Guo, Fei BMC Bioinformatics Research Article BACKGROUND: Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF). METHODS: Our method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs. RESULTS: To evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network. CONCLUSIONS: Compared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies. BioMed Central 2016-09-27 /pmc/articles/PMC5039908/ /pubmed/27677692 http://dx.doi.org/10.1186/s12859-016-1253-9 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ding, Yijie
Tang, Jijun
Guo, Fei
Predicting protein-protein interactions via multivariate mutual information of protein sequences
title Predicting protein-protein interactions via multivariate mutual information of protein sequences
title_full Predicting protein-protein interactions via multivariate mutual information of protein sequences
title_fullStr Predicting protein-protein interactions via multivariate mutual information of protein sequences
title_full_unstemmed Predicting protein-protein interactions via multivariate mutual information of protein sequences
title_short Predicting protein-protein interactions via multivariate mutual information of protein sequences
title_sort predicting protein-protein interactions via multivariate mutual information of protein sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5039908/
https://www.ncbi.nlm.nih.gov/pubmed/27677692
http://dx.doi.org/10.1186/s12859-016-1253-9
work_keys_str_mv AT dingyijie predictingproteinproteininteractionsviamultivariatemutualinformationofproteinsequences
AT tangjijun predictingproteinproteininteractionsviamultivariatemutualinformationofproteinsequences
AT guofei predictingproteinproteininteractionsviamultivariatemutualinformationofproteinsequences