Cargando…
A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences
SIMPLE SUMMARY: Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The present...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9139052/ https://www.ncbi.nlm.nih.gov/pubmed/35625503 http://dx.doi.org/10.3390/biology11050775 |
_version_ | 1784714768887578624 |
---|---|
author | Pan, Jie Wang, Shiwei Yu, Changqing Li, Liping You, Zhuhong Sun, Yanmei |
author_facet | Pan, Jie Wang, Shiwei Yu, Changqing Li, Liping You, Zhuhong Sun, Yanmei |
author_sort | Pan, Jie |
collection | PubMed |
description | SIMPLE SUMMARY: Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The presented method used Discrete Hilbert transform to extract amino acid sequence information from position-specific scoring matrices. Then these extracted features were fed into rotation forest for training and predicting. When applying our method to the three datasets (Yeast, Human, and Oryza sativa) for detecting PPIs, we obtained excellent prediction performance. Furthermore, the comparison results indicated that our computational model is effective and robust in predicting potential PPI pairs. ABSTRACT: Protein–protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the Yeast, Human, and Oryza sativa PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis. |
format | Online Article Text |
id | pubmed-9139052 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-91390522022-05-28 A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences Pan, Jie Wang, Shiwei Yu, Changqing Li, Liping You, Zhuhong Sun, Yanmei Biology (Basel) Article SIMPLE SUMMARY: Protein–protein interactions (PPIs) play a central role in the evolution and progression of various biological processes. In this article, we constructed a novel ensemble-learning-based model to predict potential PPIs, which only utilized the protein sequence information. The presented method used Discrete Hilbert transform to extract amino acid sequence information from position-specific scoring matrices. Then these extracted features were fed into rotation forest for training and predicting. When applying our method to the three datasets (Yeast, Human, and Oryza sativa) for detecting PPIs, we obtained excellent prediction performance. Furthermore, the comparison results indicated that our computational model is effective and robust in predicting potential PPI pairs. ABSTRACT: Protein–protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the Yeast, Human, and Oryza sativa PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis. MDPI 2022-05-19 /pmc/articles/PMC9139052/ /pubmed/35625503 http://dx.doi.org/10.3390/biology11050775 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Pan, Jie Wang, Shiwei Yu, Changqing Li, Liping You, Zhuhong Sun, Yanmei A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences |
title | A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences |
title_full | A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences |
title_fullStr | A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences |
title_full_unstemmed | A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences |
title_short | A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences |
title_sort | novel ensemble learning-based computational method to predict protein-protein interactions from protein primary sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9139052/ https://www.ncbi.nlm.nih.gov/pubmed/35625503 http://dx.doi.org/10.3390/biology11050775 |
work_keys_str_mv | AT panjie anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT wangshiwei anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT yuchangqing anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT liliping anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT youzhuhong anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT sunyanmei anovelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT panjie novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT wangshiwei novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT yuchangqing novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT liliping novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT youzhuhong novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences AT sunyanmei novelensemblelearningbasedcomputationalmethodtopredictproteinproteininteractionsfromproteinprimarysequences |