Cargando…

A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences

SIMPLE SUMMARY: Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The present...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Jie, Wang, Shiwei, Yu, Changqing, Li, Liping, You, Zhuhong, Sun, Yanmei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9139052/
https://www.ncbi.nlm.nih.gov/pubmed/35625503
http://dx.doi.org/10.3390/biology11050775
_version_ 1784714768887578624
author Pan, Jie
Wang, Shiwei
Yu, Changqing
Li, Liping
You, Zhuhong
Sun, Yanmei
author_facet Pan, Jie
Wang, Shiwei
Yu, Changqing
Li, Liping
You, Zhuhong
Sun, Yanmei
author_sort Pan, Jie
collection PubMed
description SIMPLE SUMMARY: Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The presented method used Discrete Hilbert transform to extract amino acid sequence information from position-specific scoring matrices. Then these extracted features were fed into rotation forest for training and predicting. When applying our method to the three datasets (Yeast, Human, and Oryza sativa) for detecting PPIs, we obtained excellent prediction performance. Furthermore, the comparison results indicated that our computational model is effective and robust in predicting potential PPI pairs. ABSTRACT: Protein–protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the Yeast, Human, and Oryza sativa PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis.
format Online
Article
Text
id pubmed-9139052
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91390522022-05-28 A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences Pan, Jie Wang, Shiwei Yu, Changqing Li, Liping You, Zhuhong Sun, Yanmei Biology (Basel) Article SIMPLE SUMMARY: Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The presented method used Discrete Hilbert transform to extract amino acid sequence information from position-specific scoring matrices. Then these extracted features were fed into rotation forest for training and predicting. When applying our method to the three datasets (Yeast, Human, and Oryza sativa) for detecting PPIs, we obtained excellent prediction performance. Furthermore, the comparison results indicated that our computational model is effective and robust in predicting potential PPI pairs. ABSTRACT: Protein–protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the Yeast, Human, and Oryza sativa PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis. MDPI 2022-05-19 /pmc/articles/PMC9139052/ /pubmed/35625503 http://dx.doi.org/10.3390/biology11050775 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pan, Jie
Wang, Shiwei
Yu, Changqing
Li, Liping
You, Zhuhong
Sun, Yanmei
A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
title A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
title_full A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
title_fullStr A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
title_full_unstemmed A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
title_short A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
title_sort novel ensemble learning-based computational method to predict protein-protein interactions from protein primary sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9139052/
https://www.ncbi.nlm.nih.gov/pubmed/35625503
http://dx.doi.org/10.3390/biology11050775
work_keys_str_mv AT panjie anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT wangshiwei anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT yuchangqing anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT liliping anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT youzhuhong anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT sunyanmei anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT panjie novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT wangshiwei novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT yuchangqing novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT liliping novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT youzhuhong novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences
AT sunyanmei novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences