Cargando…

Neural network extrapolation to distant regions of the protein fitness landscape

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitn...

Descripción completa

Detalles Bibliográficos
Autores principales: Fahlberg, Sarah A, Freschlin, Chase R, Heinzelman, Pete, Romero, Philip A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659313/
https://www.ncbi.nlm.nih.gov/pubmed/37987009
http://dx.doi.org/10.1101/2023.11.08.566287
_version_ 1785148307085983744
author Fahlberg, Sarah A
Freschlin, Chase R
Heinzelman, Pete
Romero, Philip A
author_facet Fahlberg, Sarah A
Freschlin, Chase R
Heinzelman, Pete
Romero, Philip A
author_sort Fahlberg, Sarah A
collection PubMed
description Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks’ capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models’ extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture’s inductive biases prime them to learn different aspects of the protein fitness landscape.
format Online
Article
Text
id pubmed-10659313
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106593132023-11-20 Neural network extrapolation to distant regions of the protein fitness landscape Fahlberg, Sarah A Freschlin, Chase R Heinzelman, Pete Romero, Philip A bioRxiv Article Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks’ capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models’ extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture’s inductive biases prime them to learn different aspects of the protein fitness landscape. Cold Spring Harbor Laboratory 2023-11-09 /pmc/articles/PMC10659313/ /pubmed/37987009 http://dx.doi.org/10.1101/2023.11.08.566287 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Fahlberg, Sarah A
Freschlin, Chase R
Heinzelman, Pete
Romero, Philip A
Neural network extrapolation to distant regions of the protein fitness landscape
title Neural network extrapolation to distant regions of the protein fitness landscape
title_full Neural network extrapolation to distant regions of the protein fitness landscape
title_fullStr Neural network extrapolation to distant regions of the protein fitness landscape
title_full_unstemmed Neural network extrapolation to distant regions of the protein fitness landscape
title_short Neural network extrapolation to distant regions of the protein fitness landscape
title_sort neural network extrapolation to distant regions of the protein fitness landscape
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659313/
https://www.ncbi.nlm.nih.gov/pubmed/37987009
http://dx.doi.org/10.1101/2023.11.08.566287
work_keys_str_mv AT fahlbergsaraha neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape
AT freschlinchaser neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape
AT heinzelmanpete neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape
AT romerophilipa neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape