Cargando…

Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks

[Image: see text] Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramírez-Palacios, Carlos, Marrink, Siewert J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10373491/
https://www.ncbi.nlm.nih.gov/pubmed/36961994
http://dx.doi.org/10.1021/acs.jctc.2c01227
_version_ 1785078580057735168
author Ramírez-Palacios, Carlos
Marrink, Siewert J.
author_facet Ramírez-Palacios, Carlos
Marrink, Siewert J.
author_sort Ramírez-Palacios, Carlos
collection PubMed
description [Image: see text] Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening or experimental verification. In this work, we show that a Graph Convolutional Neural Network (GCN) can be trained to predict the binding energy of combinatorial libraries of enzyme complexes using only sequence information. The GCN model uses a stack of message-passing and graph pooling layers to extract information from the protein input graph and yield a prediction. The GCN model is agnostic to the identity of the ligand, which is kept constant within the mutant libraries. Using a miniscule subset of the total combinatorial space (20(4)–20(8) mutants) as training data, the proposed GCN model achieves a high accuracy in predicting the binding energy of unseen variants. The network’s accuracy was further improved by injecting feature embeddings obtained from a language module pretrained on 10 million protein sequences. Since no structural information is needed to evaluate new variants, the deep learning algorithm is capable of scoring an enzyme variant in under 1 ms, allowing the search of billions of candidates on a single GPU.
format Online
Article
Text
id pubmed-10373491
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-103734912023-07-28 Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks Ramírez-Palacios, Carlos Marrink, Siewert J. J Chem Theory Comput [Image: see text] Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening or experimental verification. In this work, we show that a Graph Convolutional Neural Network (GCN) can be trained to predict the binding energy of combinatorial libraries of enzyme complexes using only sequence information. The GCN model uses a stack of message-passing and graph pooling layers to extract information from the protein input graph and yield a prediction. The GCN model is agnostic to the identity of the ligand, which is kept constant within the mutant libraries. Using a miniscule subset of the total combinatorial space (20(4)–20(8) mutants) as training data, the proposed GCN model achieves a high accuracy in predicting the binding energy of unseen variants. The network’s accuracy was further improved by injecting feature embeddings obtained from a language module pretrained on 10 million protein sequences. Since no structural information is needed to evaluate new variants, the deep learning algorithm is capable of scoring an enzyme variant in under 1 ms, allowing the search of billions of candidates on a single GPU. American Chemical Society 2023-03-24 /pmc/articles/PMC10373491/ /pubmed/36961994 http://dx.doi.org/10.1021/acs.jctc.2c01227 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Ramírez-Palacios, Carlos
Marrink, Siewert J.
Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks
title Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks
title_full Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks
title_fullStr Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks
title_full_unstemmed Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks
title_short Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks
title_sort super high-throughput screening of enzyme variants by spectral graph convolutional neural networks
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10373491/
https://www.ncbi.nlm.nih.gov/pubmed/36961994
http://dx.doi.org/10.1021/acs.jctc.2c01227
work_keys_str_mv AT ramirezpalacioscarlos superhighthroughputscreeningofenzymevariantsbyspectralgraphconvolutionalneuralnetworks
AT marrinksiewertj superhighthroughputscreeningofenzymevariantsbyspectralgraphconvolutionalneuralnetworks