Cargando…

Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks

[Image: see text] Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramírez-Palacios, Carlos, Marrink, Siewert J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10373491/
https://www.ncbi.nlm.nih.gov/pubmed/36961994
http://dx.doi.org/10.1021/acs.jctc.2c01227
Descripción
Sumario:[Image: see text] Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening or experimental verification. In this work, we show that a Graph Convolutional Neural Network (GCN) can be trained to predict the binding energy of combinatorial libraries of enzyme complexes using only sequence information. The GCN model uses a stack of message-passing and graph pooling layers to extract information from the protein input graph and yield a prediction. The GCN model is agnostic to the identity of the ligand, which is kept constant within the mutant libraries. Using a miniscule subset of the total combinatorial space (20(4)–20(8) mutants) as training data, the proposed GCN model achieves a high accuracy in predicting the binding energy of unseen variants. The network’s accuracy was further improved by injecting feature embeddings obtained from a language module pretrained on 10 million protein sequences. Since no structural information is needed to evaluate new variants, the deep learning algorithm is capable of scoring an enzyme variant in under 1 ms, allowing the search of billions of candidates on a single GPU.