Cargando…

Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning

Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules. They induce the degradation of a target protein by recruiting an E3 ligase to the target. The PROTAC can inactivate disease-related genes that are considered as understudied, thus has a great potential to be a new type of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Li, Xie, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980153/
https://www.ncbi.nlm.nih.gov/pubmed/36865212
http://dx.doi.org/10.1101/2023.02.23.529828
_version_ 1784899857966694400
author Xie, Li
Xie, Lei
author_facet Xie, Li
Xie, Lei
author_sort Xie, Li
collection PubMed
description Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules. They induce the degradation of a target protein by recruiting an E3 ligase to the target. The PROTAC can inactivate disease-related genes that are considered as understudied, thus has a great potential to be a new type of therapy for the treatment of incurable diseases. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. For the first time, we have developed an interpretable machine learning model PrePROTAC, which is based on a transformer-based protein sequence descriptor and random forest classification to predict genome-wide PROTAC-induced targets degradable by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved ROC-AUC of 0.81, PR-AUC of 0.84, and over 40% sensitivity at a false positive rate of 0.05, respectively. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method to identify positions in the protein structure, which play key roles in the PROTAC activity. The key residues identified were consistent with our existing knowledge. We applied PrePROTAC to identify more than 600 novel understudied proteins that are potentially degradable by CRBN, and proposed PROTAC compounds for three novel drug targets associated with Alzheimer’s disease.
format Online
Article
Text
id pubmed-9980153
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-99801532023-03-03 Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning Xie, Li Xie, Lei bioRxiv Article Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules. They induce the degradation of a target protein by recruiting an E3 ligase to the target. The PROTAC can inactivate disease-related genes that are considered as understudied, thus has a great potential to be a new type of therapy for the treatment of incurable diseases. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. For the first time, we have developed an interpretable machine learning model PrePROTAC, which is based on a transformer-based protein sequence descriptor and random forest classification to predict genome-wide PROTAC-induced targets degradable by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved ROC-AUC of 0.81, PR-AUC of 0.84, and over 40% sensitivity at a false positive rate of 0.05, respectively. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method to identify positions in the protein structure, which play key roles in the PROTAC activity. The key residues identified were consistent with our existing knowledge. We applied PrePROTAC to identify more than 600 novel understudied proteins that are potentially degradable by CRBN, and proposed PROTAC compounds for three novel drug targets associated with Alzheimer’s disease. Cold Spring Harbor Laboratory 2023-02-24 /pmc/articles/PMC9980153/ /pubmed/36865212 http://dx.doi.org/10.1101/2023.02.23.529828 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Xie, Li
Xie, Lei
Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning
title Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning
title_full Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning
title_fullStr Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning
title_full_unstemmed Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning
title_short Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning
title_sort elucidation of genome-wide understudied proteins targeted by protac-induced degradation using interpretable machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980153/
https://www.ncbi.nlm.nih.gov/pubmed/36865212
http://dx.doi.org/10.1101/2023.02.23.529828
work_keys_str_mv AT xieli elucidationofgenomewideunderstudiedproteinstargetedbyprotacinduceddegradationusinginterpretablemachinelearning
AT xielei elucidationofgenomewideunderstudiedproteinstargetedbyprotacinduceddegradationusinginterpretablemachinelearning