Cargando…

Discovery of moiety preference by Shapley value in protein kinase family using random forest models

BACKGROUND: Human protein kinases play important roles in cancers, are highly co-regulated by kinase families rather than a single kinase, and complementarily regulate signaling pathways. Even though there are > 100,000 protein kinase inhibitors, only 67 kinase drugs are currently approved by the...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yu-Wei, Hsu, Yen-Chao, Chuang, Yi-Hsuan, Chen, Yun-Ti, Lin, Xiang-Yu, Fan, You-Wei, Pathak, Nikhil, Yang, Jinn-Moon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9011936/
https://www.ncbi.nlm.nih.gov/pubmed/35428180
http://dx.doi.org/10.1186/s12859-022-04663-5
_version_ 1784687701532868608
author Huang, Yu-Wei
Hsu, Yen-Chao
Chuang, Yi-Hsuan
Chen, Yun-Ti
Lin, Xiang-Yu
Fan, You-Wei
Pathak, Nikhil
Yang, Jinn-Moon
author_facet Huang, Yu-Wei
Hsu, Yen-Chao
Chuang, Yi-Hsuan
Chen, Yun-Ti
Lin, Xiang-Yu
Fan, You-Wei
Pathak, Nikhil
Yang, Jinn-Moon
author_sort Huang, Yu-Wei
collection PubMed
description BACKGROUND: Human protein kinases play important roles in cancers, are highly co-regulated by kinase families rather than a single kinase, and complementarily regulate signaling pathways. Even though there are > 100,000 protein kinase inhibitors, only 67 kinase drugs are currently approved by the Food and Drug Administration (FDA). RESULTS: In this study, we used “merged moiety-based interpretable features (MMIFs),” which merged four moiety-based compound features, including Checkmol fingerprint, PubChem fingerprint, rings in drugs, and in-house moieties as the input features for building random forest (RF) models. By using > 200,000 bioactivity test data, we classified inhibitors as kinase family inhibitors or non-inhibitors in the machine learning. The results showed that our RF models achieved good accuracy (> 0.8) for the 10 kinase families. In addition, we found kinase common and specific moieties across families using the Shapley Additive exPlanations (SHAP) approach. We also verified our results using protein kinase complex structures containing important interactions of the hinges, DFGs, or P-loops in the ATP pocket of active sites. CONCLUSIONS: In summary, we not only constructed highly accurate prediction models for predicting inhibitors of kinase families but also discovered common and specific inhibitor moieties between different kinase families, providing new opportunities for designing protein kinase inhibitors.
format Online
Article
Text
id pubmed-9011936
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90119362022-04-16 Discovery of moiety preference by Shapley value in protein kinase family using random forest models Huang, Yu-Wei Hsu, Yen-Chao Chuang, Yi-Hsuan Chen, Yun-Ti Lin, Xiang-Yu Fan, You-Wei Pathak, Nikhil Yang, Jinn-Moon BMC Bioinformatics Research BACKGROUND: Human protein kinases play important roles in cancers, are highly co-regulated by kinase families rather than a single kinase, and complementarily regulate signaling pathways. Even though there are > 100,000 protein kinase inhibitors, only 67 kinase drugs are currently approved by the Food and Drug Administration (FDA). RESULTS: In this study, we used “merged moiety-based interpretable features (MMIFs),” which merged four moiety-based compound features, including Checkmol fingerprint, PubChem fingerprint, rings in drugs, and in-house moieties as the input features for building random forest (RF) models. By using > 200,000 bioactivity test data, we classified inhibitors as kinase family inhibitors or non-inhibitors in the machine learning. The results showed that our RF models achieved good accuracy (> 0.8) for the 10 kinase families. In addition, we found kinase common and specific moieties across families using the Shapley Additive exPlanations (SHAP) approach. We also verified our results using protein kinase complex structures containing important interactions of the hinges, DFGs, or P-loops in the ATP pocket of active sites. CONCLUSIONS: In summary, we not only constructed highly accurate prediction models for predicting inhibitors of kinase families but also discovered common and specific inhibitor moieties between different kinase families, providing new opportunities for designing protein kinase inhibitors. BioMed Central 2022-04-15 /pmc/articles/PMC9011936/ /pubmed/35428180 http://dx.doi.org/10.1186/s12859-022-04663-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Huang, Yu-Wei
Hsu, Yen-Chao
Chuang, Yi-Hsuan
Chen, Yun-Ti
Lin, Xiang-Yu
Fan, You-Wei
Pathak, Nikhil
Yang, Jinn-Moon
Discovery of moiety preference by Shapley value in protein kinase family using random forest models
title Discovery of moiety preference by Shapley value in protein kinase family using random forest models
title_full Discovery of moiety preference by Shapley value in protein kinase family using random forest models
title_fullStr Discovery of moiety preference by Shapley value in protein kinase family using random forest models
title_full_unstemmed Discovery of moiety preference by Shapley value in protein kinase family using random forest models
title_short Discovery of moiety preference by Shapley value in protein kinase family using random forest models
title_sort discovery of moiety preference by shapley value in protein kinase family using random forest models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9011936/
https://www.ncbi.nlm.nih.gov/pubmed/35428180
http://dx.doi.org/10.1186/s12859-022-04663-5
work_keys_str_mv AT huangyuwei discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT hsuyenchao discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT chuangyihsuan discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT chenyunti discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT linxiangyu discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT fanyouwei discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT pathaknikhil discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels
AT yangjinnmoon discoveryofmoietypreferencebyshapleyvalueinproteinkinasefamilyusingrandomforestmodels