Cargando…

HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines

The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Mingjie, Günther, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10054327/
https://www.ncbi.nlm.nih.gov/pubmed/36983037
http://dx.doi.org/10.3390/ijms24065960
_version_ 1785015644092104704
author Gao, Mingjie
Günther, Stefan
author_facet Gao, Mingjie
Günther, Stefan
author_sort Gao, Mingjie
collection PubMed
description The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein–ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines’ classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity.
format Online
Article
Text
id pubmed-10054327
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100543272023-03-30 HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines Gao, Mingjie Günther, Stefan Int J Mol Sci Communication The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein–ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines’ classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity. MDPI 2023-03-22 /pmc/articles/PMC10054327/ /pubmed/36983037 http://dx.doi.org/10.3390/ijms24065960 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Communication
Gao, Mingjie
Günther, Stefan
HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_full HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_fullStr HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_full_unstemmed HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_short HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_sort hypercys: a structure- and sequence-based predictor of hyper-reactive druggable cysteines
topic Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10054327/
https://www.ncbi.nlm.nih.gov/pubmed/36983037
http://dx.doi.org/10.3390/ijms24065960
work_keys_str_mv AT gaomingjie hypercysastructureandsequencebasedpredictorofhyperreactivedruggablecysteines
AT guntherstefan hypercysastructureandsequencebasedpredictorofhyperreactivedruggablecysteines