Cargando…
GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47
Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning f...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10623438/ https://www.ncbi.nlm.nih.gov/pubmed/37927570 http://dx.doi.org/10.3389/fchem.2023.1292869 |
_version_ | 1785130740535525376 |
---|---|
author | Shan, Wenying Chen, Lvqi Xu, Hao Zhong, Qinghao Xu, Yinqiu Yao, Hequan Lin, Kejiang Li, Xuanyi |
author_facet | Shan, Wenying Chen, Lvqi Xu, Hao Zhong, Qinghao Xu, Yinqiu Yao, Hequan Lin, Kejiang Li, Xuanyi |
author_sort | Shan, Wenying |
collection | PubMed |
description | Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC(50)s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction. |
format | Online Article Text |
id | pubmed-10623438 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-106234382023-11-04 GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 Shan, Wenying Chen, Lvqi Xu, Hao Zhong, Qinghao Xu, Yinqiu Yao, Hequan Lin, Kejiang Li, Xuanyi Front Chem Chemistry Identifying compound–protein interaction plays a vital role in drug discovery. Artificial intelligence (AI), especially machine learning (ML) and deep learning (DL) algorithms, are playing increasingly important roles in compound-protein interaction (CPI) prediction. However, ML relies on learning from large sample data. And the CPI for specific target often has a small amount of data available. To overcome the dilemma, we propose a virtual screening model, in which word2vec is used as an embedding tool to generate low-dimensional vectors of SMILES of compounds and amino acid sequences of proteins, and the modified multi-grained cascade forest based gcForest is used as the classifier. This proposed method is capable of constructing a model from raw data, adjusting model complexity according to the scale of datasets, especially for small scale datasets, and is robust with few hyper-parameters and without over-fitting. We found that the proposed model is superior to other CPI prediction models and performs well on the constructed challenging dataset. We finally predicted 2 new inhibitors for clusters of differentiation 47(CD47) which has few known inhibitors. The IC(50)s of enzyme activities of these 2 new small molecular inhibitors targeting CD47-SIRPα interaction are 3.57 and 4.79 μM respectively. These results fully demonstrate the competence of this concise but efficient tool for CPI prediction. Frontiers Media S.A. 2023-10-20 /pmc/articles/PMC10623438/ /pubmed/37927570 http://dx.doi.org/10.3389/fchem.2023.1292869 Text en Copyright © 2023 Shan, Chen, Xu, Zhong, Xu, Yao, Lin and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Chemistry Shan, Wenying Chen, Lvqi Xu, Hao Zhong, Qinghao Xu, Yinqiu Yao, Hequan Lin, Kejiang Li, Xuanyi GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 |
title | GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 |
title_full | GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 |
title_fullStr | GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 |
title_full_unstemmed | GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 |
title_short | GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47 |
title_sort | gcforest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting cd47 |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10623438/ https://www.ncbi.nlm.nih.gov/pubmed/37927570 http://dx.doi.org/10.3389/fchem.2023.1292869 |
work_keys_str_mv | AT shanwenying gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT chenlvqi gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT xuhao gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT zhongqinghao gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT xuyinqiu gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT yaohequan gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT linkejiang gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 AT lixuanyi gcforestbasedcompoundproteininteractionpredictionmodelanditsapplicationindiscoveringsmallmoleculedrugstargetingcd47 |