Cargando…

Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation

Protein-protein interactions (PPIs) play a crucial role in understanding disease pathogenesis, genetic mechanisms, guiding drug design, and other biochemical processes, thus, the identification of PPIs is of great importance. With the rapid development of high-throughput sequencing technology, a lar...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yang, Wang, Zheng, You, Zhu-Hong, Li, Li-Ping, Hu, Xuegang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8888042/
https://www.ncbi.nlm.nih.gov/pubmed/35242211
http://dx.doi.org/10.1155/2022/7191684
_version_ 1784661041863458816
author Li, Yang
Wang, Zheng
You, Zhu-Hong
Li, Li-Ping
Hu, Xuegang
author_facet Li, Yang
Wang, Zheng
You, Zhu-Hong
Li, Li-Ping
Hu, Xuegang
author_sort Li, Yang
collection PubMed
description Protein-protein interactions (PPIs) play a crucial role in understanding disease pathogenesis, genetic mechanisms, guiding drug design, and other biochemical processes, thus, the identification of PPIs is of great importance. With the rapid development of high-throughput sequencing technology, a large amount of PPIs sequence data has been accumulated. Researchers have designed many experimental methods to detect PPIs by using these sequence data, hence, the prediction of PPIs has become a research hotspot in proteomics. However, since traditional experimental methods are both time-consuming and costly, it is difficult to analyze and predict the massive amount of PPI data quickly and accurately. To address these issues, many computational systems employing machine learning knowledge were widely applied to PPIs prediction, thereby improving the overall recognition rate. In this paper, a novel and efficient computational technology is presented to implement a protein interaction prediction system using only protein sequence information. First, the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST) was employed to generate a position-specific scoring matrix (PSSM) containing protein evolutionary information from the initial protein sequence. Second, we used a novel data processing feature representation scheme, MatFLDA, to extract the essential information of PSSM for protein sequences and obtained five training and five testing datasets by adopting a five-fold cross-validation method. Finally, the random fern (RFs) classifier was employed to infer the interactions among proteins, and a model called MatFLDA_RFs was developed. The proposed MatFLDA_RFs model achieved good prediction performance with 95.03% average accuracy on Yeast dataset and 85.35% average accuracy on H. pylori dataset, which effectively outperformed other existing computational methods. The experimental results indicate that the proposed method is capable of yielding better prediction results of PPIs, which provides an effective tool for the detection of new PPIs and the in-depth study of proteomics. Finally, we also developed a web server for the proposed model to predict protein-protein interactions, which is freely accessible online at http://120.77.11.78:5001/webserver/MatFLDA_RFs.
format Online
Article
Text
id pubmed-8888042
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-88880422022-03-02 Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation Li, Yang Wang, Zheng You, Zhu-Hong Li, Li-Ping Hu, Xuegang Comput Math Methods Med Research Article Protein-protein interactions (PPIs) play a crucial role in understanding disease pathogenesis, genetic mechanisms, guiding drug design, and other biochemical processes, thus, the identification of PPIs is of great importance. With the rapid development of high-throughput sequencing technology, a large amount of PPIs sequence data has been accumulated. Researchers have designed many experimental methods to detect PPIs by using these sequence data, hence, the prediction of PPIs has become a research hotspot in proteomics. However, since traditional experimental methods are both time-consuming and costly, it is difficult to analyze and predict the massive amount of PPI data quickly and accurately. To address these issues, many computational systems employing machine learning knowledge were widely applied to PPIs prediction, thereby improving the overall recognition rate. In this paper, a novel and efficient computational technology is presented to implement a protein interaction prediction system using only protein sequence information. First, the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST) was employed to generate a position-specific scoring matrix (PSSM) containing protein evolutionary information from the initial protein sequence. Second, we used a novel data processing feature representation scheme, MatFLDA, to extract the essential information of PSSM for protein sequences and obtained five training and five testing datasets by adopting a five-fold cross-validation method. Finally, the random fern (RFs) classifier was employed to infer the interactions among proteins, and a model called MatFLDA_RFs was developed. The proposed MatFLDA_RFs model achieved good prediction performance with 95.03% average accuracy on Yeast dataset and 85.35% average accuracy on H. pylori dataset, which effectively outperformed other existing computational methods. The experimental results indicate that the proposed method is capable of yielding better prediction results of PPIs, which provides an effective tool for the detection of new PPIs and the in-depth study of proteomics. Finally, we also developed a web server for the proposed model to predict protein-protein interactions, which is freely accessible online at http://120.77.11.78:5001/webserver/MatFLDA_RFs. Hindawi 2022-02-22 /pmc/articles/PMC8888042/ /pubmed/35242211 http://dx.doi.org/10.1155/2022/7191684 Text en Copyright © 2022 Yang Li et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Yang
Wang, Zheng
You, Zhu-Hong
Li, Li-Ping
Hu, Xuegang
Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation
title Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation
title_full Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation
title_fullStr Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation
title_full_unstemmed Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation
title_short Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation
title_sort predicting protein-protein interactions via random ferns with evolutionary matrix representation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8888042/
https://www.ncbi.nlm.nih.gov/pubmed/35242211
http://dx.doi.org/10.1155/2022/7191684
work_keys_str_mv AT liyang predictingproteinproteininteractionsviarandomfernswithevolutionarymatrixrepresentation
AT wangzheng predictingproteinproteininteractionsviarandomfernswithevolutionarymatrixrepresentation
AT youzhuhong predictingproteinproteininteractionsviarandomfernswithevolutionarymatrixrepresentation
AT liliping predictingproteinproteininteractionsviarandomfernswithevolutionarymatrixrepresentation
AT huxuegang predictingproteinproteininteractionsviarandomfernswithevolutionarymatrixrepresentation