Cargando…

Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer

Disruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is...

Descripción completa

Detalles Bibliográficos
Autores principales: Danishuddin, Kumar, Vikas, Parate, Shraddha, Bahuguna, Ashutosh, Lee, Gihwan, Kim, Myeong Ok, Lee, Keun Woo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8308948/
https://www.ncbi.nlm.nih.gov/pubmed/34358125
http://dx.doi.org/10.3390/ph14070699
_version_ 1783728405228814336
author Danishuddin,
Kumar, Vikas
Parate, Shraddha
Bahuguna, Ashutosh
Lee, Gihwan
Kim, Myeong Ok
Lee, Keun Woo
author_facet Danishuddin,
Kumar, Vikas
Parate, Shraddha
Bahuguna, Ashutosh
Lee, Gihwan
Kim, Myeong Ok
Lee, Keun Woo
author_sort Danishuddin,
collection PubMed
description Disruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is an important drug target for controlling cancer via epigenetic processes. In the present study, we have developed various predictive models for modeling the inhibitory activity of EZH2. Binary and multiclass models were built using SVM, random forest and XGBoost methods. Rigorous validation approaches including predictiveness curve, Y-randomization and applicability domain (AD) were employed for evaluation of the developed models. Eighteen descriptors selected from Boruta methods have been used for modeling. For binary classification, random forest and XGBoost achieved an accuracy of 0.80 and 0.82, respectively, on external test set. Contrastingly, for multiclass models, random forest and XGBoost achieved an accuracy of 0.73 and 0.75, respectively. 500 Y-randomization runs demonstrate that the models were robust and the correlations were not by chance. Evaluation metrics from predictiveness curve show that the selected eighteen descriptors predict active compounds with total gain (TG) of 0.79 and 0.59 for XGBoost and random forest, respectively. Validated models were further used for virtual screening and molecular docking in search of potential hits. A total of 221 compounds were commonly predicted as active with above the set probability threshold and also under the AD of training set. Molecular docking revealed that three compounds have reasonable binding energy and favorable interactions with critical residues in the active site of EZH2. In conclusion, we highlighted the potential of rigorously validated models for accurately predicting and ranking the activities of lead molecules against cancer epigenetic targets. The models presented in this study represent the platform for development of EZH2 inhibitors.
format Online
Article
Text
id pubmed-8308948
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83089482021-07-25 Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer Danishuddin, Kumar, Vikas Parate, Shraddha Bahuguna, Ashutosh Lee, Gihwan Kim, Myeong Ok Lee, Keun Woo Pharmaceuticals (Basel) Article Disruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is an important drug target for controlling cancer via epigenetic processes. In the present study, we have developed various predictive models for modeling the inhibitory activity of EZH2. Binary and multiclass models were built using SVM, random forest and XGBoost methods. Rigorous validation approaches including predictiveness curve, Y-randomization and applicability domain (AD) were employed for evaluation of the developed models. Eighteen descriptors selected from Boruta methods have been used for modeling. For binary classification, random forest and XGBoost achieved an accuracy of 0.80 and 0.82, respectively, on external test set. Contrastingly, for multiclass models, random forest and XGBoost achieved an accuracy of 0.73 and 0.75, respectively. 500 Y-randomization runs demonstrate that the models were robust and the correlations were not by chance. Evaluation metrics from predictiveness curve show that the selected eighteen descriptors predict active compounds with total gain (TG) of 0.79 and 0.59 for XGBoost and random forest, respectively. Validated models were further used for virtual screening and molecular docking in search of potential hits. A total of 221 compounds were commonly predicted as active with above the set probability threshold and also under the AD of training set. Molecular docking revealed that three compounds have reasonable binding energy and favorable interactions with critical residues in the active site of EZH2. In conclusion, we highlighted the potential of rigorously validated models for accurately predicting and ranking the activities of lead molecules against cancer epigenetic targets. The models presented in this study represent the platform for development of EZH2 inhibitors. MDPI 2021-07-20 /pmc/articles/PMC8308948/ /pubmed/34358125 http://dx.doi.org/10.3390/ph14070699 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Danishuddin,
Kumar, Vikas
Parate, Shraddha
Bahuguna, Ashutosh
Lee, Gihwan
Kim, Myeong Ok
Lee, Keun Woo
Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_full Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_fullStr Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_full_unstemmed Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_short Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_sort development of machine learning models for accurately predicting and ranking the activity of lead molecules to inhibit prc2 dependent cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8308948/
https://www.ncbi.nlm.nih.gov/pubmed/34358125
http://dx.doi.org/10.3390/ph14070699
work_keys_str_mv AT danishuddin developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT kumarvikas developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT parateshraddha developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT bahugunaashutosh developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT leegihwan developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT kimmyeongok developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT leekeunwoo developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer