Cargando…

ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding

MOTIVATION: Drug-target binding affinity (DTA) reflects the strength of the drug-target interaction; therefore, predicting the DTA can considerably benefit drug discovery by narrowing the search space and pruning drug-target (DT) pairs with low binding affinity scores. Representation learning using...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Junjie, Wen, NaiFeng, Wang, Chunyu, Zhao, Lingling, Cheng, Liang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8922401/
https://www.ncbi.nlm.nih.gov/pubmed/35292100
http://dx.doi.org/10.1186/s13321-022-00591-x
_version_ 1784669514613391360
author Wang, Junjie
Wen, NaiFeng
Wang, Chunyu
Zhao, Lingling
Cheng, Liang
author_facet Wang, Junjie
Wen, NaiFeng
Wang, Chunyu
Zhao, Lingling
Cheng, Liang
author_sort Wang, Junjie
collection PubMed
description MOTIVATION: Drug-target binding affinity (DTA) reflects the strength of the drug-target interaction; therefore, predicting the DTA can considerably benefit drug discovery by narrowing the search space and pruning drug-target (DT) pairs with low binding affinity scores. Representation learning using deep neural networks has achieved promising performance compared with traditional machine learning methods; hence, extensive research efforts have been made in learning the feature representation of proteins and compounds. However, such feature representation learning relies on a large-scale labelled dataset, which is not always available. RESULTS: We present an end-to-end deep learning framework, ELECTRA-DTA, to predict the binding affinity of drug-target pairs. This framework incorporates an unsupervised learning mechanism to train two ELECTRA-based contextual embedding models, one for protein amino acids and the other for compound SMILES string encoding. In addition, ELECTRA-DTA leverages a squeeze-and-excitation (SE) convolutional neural network block stacked over three fully connected layers to further capture the sequential and spatial features of the protein sequence and SMILES for the DTA regression task. Experimental evaluations show that ELECTRA-DTA outperforms various state-of-the-art DTA prediction models, especially with the challenging, interaction-sparse BindingDB dataset. In target selection and drug repurposing for COVID-19, ELECTRA-DTA also offers competitive performance, suggesting its potential in speeding drug discovery and generalizability for other compound- or protein-related computational tasks. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00591-x.
format Online
Article
Text
id pubmed-8922401
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-89224012022-03-15 ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding Wang, Junjie Wen, NaiFeng Wang, Chunyu Zhao, Lingling Cheng, Liang J Cheminform Research Article MOTIVATION: Drug-target binding affinity (DTA) reflects the strength of the drug-target interaction; therefore, predicting the DTA can considerably benefit drug discovery by narrowing the search space and pruning drug-target (DT) pairs with low binding affinity scores. Representation learning using deep neural networks has achieved promising performance compared with traditional machine learning methods; hence, extensive research efforts have been made in learning the feature representation of proteins and compounds. However, such feature representation learning relies on a large-scale labelled dataset, which is not always available. RESULTS: We present an end-to-end deep learning framework, ELECTRA-DTA, to predict the binding affinity of drug-target pairs. This framework incorporates an unsupervised learning mechanism to train two ELECTRA-based contextual embedding models, one for protein amino acids and the other for compound SMILES string encoding. In addition, ELECTRA-DTA leverages a squeeze-and-excitation (SE) convolutional neural network block stacked over three fully connected layers to further capture the sequential and spatial features of the protein sequence and SMILES for the DTA regression task. Experimental evaluations show that ELECTRA-DTA outperforms various state-of-the-art DTA prediction models, especially with the challenging, interaction-sparse BindingDB dataset. In target selection and drug repurposing for COVID-19, ELECTRA-DTA also offers competitive performance, suggesting its potential in speeding drug discovery and generalizability for other compound- or protein-related computational tasks. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00591-x. Springer International Publishing 2022-03-15 /pmc/articles/PMC8922401/ /pubmed/35292100 http://dx.doi.org/10.1186/s13321-022-00591-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Wang, Junjie
Wen, NaiFeng
Wang, Chunyu
Zhao, Lingling
Cheng, Liang
ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
title ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
title_full ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
title_fullStr ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
title_full_unstemmed ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
title_short ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
title_sort electra-dta: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8922401/
https://www.ncbi.nlm.nih.gov/pubmed/35292100
http://dx.doi.org/10.1186/s13321-022-00591-x
work_keys_str_mv AT wangjunjie electradtaanewcompoundproteinbindingaffinitypredictionmodelbasedonthecontextualizedsequenceencoding
AT wennaifeng electradtaanewcompoundproteinbindingaffinitypredictionmodelbasedonthecontextualizedsequenceencoding
AT wangchunyu electradtaanewcompoundproteinbindingaffinitypredictionmodelbasedonthecontextualizedsequenceencoding
AT zhaolingling electradtaanewcompoundproteinbindingaffinitypredictionmodelbasedonthecontextualizedsequenceencoding
AT chengliang electradtaanewcompoundproteinbindingaffinitypredictionmodelbasedonthecontextualizedsequenceencoding