Cargando…
Neural network extrapolation to distant regions of the protein fitness landscape
Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitn...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659313/ https://www.ncbi.nlm.nih.gov/pubmed/37987009 http://dx.doi.org/10.1101/2023.11.08.566287 |
_version_ | 1785148307085983744 |
---|---|
author | Fahlberg, Sarah A Freschlin, Chase R Heinzelman, Pete Romero, Philip A |
author_facet | Fahlberg, Sarah A Freschlin, Chase R Heinzelman, Pete Romero, Philip A |
author_sort | Fahlberg, Sarah A |
collection | PubMed |
description | Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks’ capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models’ extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture’s inductive biases prime them to learn different aspects of the protein fitness landscape. |
format | Online Article Text |
id | pubmed-10659313 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-106593132023-11-20 Neural network extrapolation to distant regions of the protein fitness landscape Fahlberg, Sarah A Freschlin, Chase R Heinzelman, Pete Romero, Philip A bioRxiv Article Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks’ capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models’ extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture’s inductive biases prime them to learn different aspects of the protein fitness landscape. Cold Spring Harbor Laboratory 2023-11-09 /pmc/articles/PMC10659313/ /pubmed/37987009 http://dx.doi.org/10.1101/2023.11.08.566287 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Fahlberg, Sarah A Freschlin, Chase R Heinzelman, Pete Romero, Philip A Neural network extrapolation to distant regions of the protein fitness landscape |
title | Neural network extrapolation to distant regions of the protein fitness landscape |
title_full | Neural network extrapolation to distant regions of the protein fitness landscape |
title_fullStr | Neural network extrapolation to distant regions of the protein fitness landscape |
title_full_unstemmed | Neural network extrapolation to distant regions of the protein fitness landscape |
title_short | Neural network extrapolation to distant regions of the protein fitness landscape |
title_sort | neural network extrapolation to distant regions of the protein fitness landscape |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659313/ https://www.ncbi.nlm.nih.gov/pubmed/37987009 http://dx.doi.org/10.1101/2023.11.08.566287 |
work_keys_str_mv | AT fahlbergsaraha neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape AT freschlinchaser neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape AT heinzelmanpete neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape AT romerophilipa neuralnetworkextrapolationtodistantregionsoftheproteinfitnesslandscape |