Cargando…

GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting

Protein–protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Bin, Chen, Cheng, Zhou, Hongyan, Liu, Bingqiang, Ma, Qin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8377384/
https://www.ncbi.nlm.nih.gov/pubmed/33515750
http://dx.doi.org/10.1016/j.gpb.2021.01.001
_version_ 1783740647713275904
author Yu, Bin
Chen, Cheng
Zhou, Hongyan
Liu, Bingqiang
Ma, Qin
author_facet Yu, Bin
Chen, Cheng
Zhou, Hongyan
Liu, Bingqiang
Ma, Qin
author_sort Yu, Bin
collection PubMed
description Protein–protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
format Online
Article
Text
id pubmed-8377384
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-83773842021-08-26 GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting Yu, Bin Chen, Cheng Zhou, Hongyan Liu, Bingqiang Ma, Qin Genomics Proteomics Bioinformatics Method Protein–protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/. Elsevier 2020-10 2021-01-27 /pmc/articles/PMC8377384/ /pubmed/33515750 http://dx.doi.org/10.1016/j.gpb.2021.01.001 Text en https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Yu, Bin
Chen, Cheng
Zhou, Hongyan
Liu, Bingqiang
Ma, Qin
GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
title GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
title_full GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
title_fullStr GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
title_full_unstemmed GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
title_short GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
title_sort gtb-ppi: predict protein–protein interactions based on l1-regularized logistic regression and gradient tree boosting
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8377384/
https://www.ncbi.nlm.nih.gov/pubmed/33515750
http://dx.doi.org/10.1016/j.gpb.2021.01.001
work_keys_str_mv AT yubin gtbppipredictproteinproteininteractionsbasedonl1regularizedlogisticregressionandgradienttreeboosting
AT chencheng gtbppipredictproteinproteininteractionsbasedonl1regularizedlogisticregressionandgradienttreeboosting
AT zhouhongyan gtbppipredictproteinproteininteractionsbasedonl1regularizedlogisticregressionandgradienttreeboosting
AT liubingqiang gtbppipredictproteinproteininteractionsbasedonl1regularizedlogisticregressionandgradienttreeboosting
AT maqin gtbppipredictproteinproteininteractionsbasedonl1regularizedlogisticregressionandgradienttreeboosting