Cargando…

GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47

Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning f...

Descripción completa

Detalles Bibliográficos
Autores principales: Shan, Wenying, Chen, Lvqi, Xu, Hao, Zhong, Qinghao, Xu, Yinqiu, Yao, Hequan, Lin, Kejiang, Li, Xuanyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10623438/
https://www.ncbi.nlm.nih.gov/pubmed/37927570
http://dx.doi.org/10.3389/fchem.2023.1292869
_version_ 1785130740535525376
author Shan, Wenying
Chen, Lvqi
Xu, Hao
Zhong, Qinghao
Xu, Yinqiu
Yao, Hequan
Lin, Kejiang
Li, Xuanyi
author_facet Shan, Wenying
Chen, Lvqi
Xu, Hao
Zhong, Qinghao
Xu, Yinqiu
Yao, Hequan
Lin, Kejiang
Li, Xuanyi
author_sort Shan, Wenying
collection PubMed
description Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC(50)s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction.
format Online
Article
Text
id pubmed-10623438
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106234382023-11-04 GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 Shan, Wenying Chen, Lvqi Xu, Hao Zhong, Qinghao Xu, Yinqiu Yao, Hequan Lin, Kejiang Li, Xuanyi Front Chem Chemistry Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC(50)s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction. Frontiers Media S.A. 2023-10-20 /pmc/articles/PMC10623438/ /pubmed/37927570 http://dx.doi.org/10.3389/fchem.2023.1292869 Text en Copyright © 2023 Shan, Chen, Xu, Zhong, Xu, Yao, Lin and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Chemistry
Shan, Wenying
Chen, Lvqi
Xu, Hao
Zhong, Qinghao
Xu, Yinqiu
Yao, Hequan
Lin, Kejiang
Li, Xuanyi
GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
title GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
title_full GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
title_fullStr GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
title_full_unstemmed GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
title_short GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
title_sort gcforest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting cd47
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10623438/
https://www.ncbi.nlm.nih.gov/pubmed/37927570
http://dx.doi.org/10.3389/fchem.2023.1292869
work_keys_str_mv AT shanwenying gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT chenlvqi gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT xuhao gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT zhongqinghao gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT xuyinqiu gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT yaohequan gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT linkejiang gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47
AT lixuanyi gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47