Cargando…
Large-scale design and refinement of stable proteins using sequence-only models
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8920274/ https://www.ncbi.nlm.nih.gov/pubmed/35286324 http://dx.doi.org/10.1371/journal.pone.0265020 |
_version_ | 1784669092745052160 |
---|---|
author | Singer, Jedediah M. Novotney, Scott Strickland, Devin Haddox, Hugh K. Leiby, Nicholas Rocklin, Gabriel J. Chow, Cameron M. Roy, Anindya Bera, Asim K. Motta, Francis C. Cao, Longxing Strauch, Eva-Maria Chidyausiku, Tamuka M. Ford, Alex Ho, Ethan Zaitzeff, Alexander Mackenzie, Craig O. Eramian, Hamed DiMaio, Frank Grigoryan, Gevorg Vaughn, Matthew Stewart, Lance J. Baker, David Klavins, Eric |
author_facet | Singer, Jedediah M. Novotney, Scott Strickland, Devin Haddox, Hugh K. Leiby, Nicholas Rocklin, Gabriel J. Chow, Cameron M. Roy, Anindya Bera, Asim K. Motta, Francis C. Cao, Longxing Strauch, Eva-Maria Chidyausiku, Tamuka M. Ford, Alex Ho, Ethan Zaitzeff, Alexander Mackenzie, Craig O. Eramian, Hamed DiMaio, Frank Grigoryan, Gevorg Vaughn, Matthew Stewart, Lance J. Baker, David Klavins, Eric |
author_sort | Singer, Jedediah M. |
collection | PubMed |
description | Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model—despite weaknesses including a noisy data set—can be used to substantially increase the stability of both expert-designed and model-generated proteins. |
format | Online Article Text |
id | pubmed-8920274 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-89202742022-03-15 Large-scale design and refinement of stable proteins using sequence-only models Singer, Jedediah M. Novotney, Scott Strickland, Devin Haddox, Hugh K. Leiby, Nicholas Rocklin, Gabriel J. Chow, Cameron M. Roy, Anindya Bera, Asim K. Motta, Francis C. Cao, Longxing Strauch, Eva-Maria Chidyausiku, Tamuka M. Ford, Alex Ho, Ethan Zaitzeff, Alexander Mackenzie, Craig O. Eramian, Hamed DiMaio, Frank Grigoryan, Gevorg Vaughn, Matthew Stewart, Lance J. Baker, David Klavins, Eric PLoS One Research Article Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model—despite weaknesses including a noisy data set—can be used to substantially increase the stability of both expert-designed and model-generated proteins. Public Library of Science 2022-03-14 /pmc/articles/PMC8920274/ /pubmed/35286324 http://dx.doi.org/10.1371/journal.pone.0265020 Text en © 2022 Singer et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Singer, Jedediah M. Novotney, Scott Strickland, Devin Haddox, Hugh K. Leiby, Nicholas Rocklin, Gabriel J. Chow, Cameron M. Roy, Anindya Bera, Asim K. Motta, Francis C. Cao, Longxing Strauch, Eva-Maria Chidyausiku, Tamuka M. Ford, Alex Ho, Ethan Zaitzeff, Alexander Mackenzie, Craig O. Eramian, Hamed DiMaio, Frank Grigoryan, Gevorg Vaughn, Matthew Stewart, Lance J. Baker, David Klavins, Eric Large-scale design and refinement of stable proteins using sequence-only models |
title | Large-scale design and refinement of stable proteins using sequence-only models |
title_full | Large-scale design and refinement of stable proteins using sequence-only models |
title_fullStr | Large-scale design and refinement of stable proteins using sequence-only models |
title_full_unstemmed | Large-scale design and refinement of stable proteins using sequence-only models |
title_short | Large-scale design and refinement of stable proteins using sequence-only models |
title_sort | large-scale design and refinement of stable proteins using sequence-only models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8920274/ https://www.ncbi.nlm.nih.gov/pubmed/35286324 http://dx.doi.org/10.1371/journal.pone.0265020 |
work_keys_str_mv | AT singerjedediahm largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT novotneyscott largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT stricklanddevin largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT haddoxhughk largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT leibynicholas largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT rocklingabrielj largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT chowcameronm largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT royanindya largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT beraasimk largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT mottafrancisc largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT caolongxing largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT strauchevamaria largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT chidyausikutamukam largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT fordalex largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT hoethan largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT zaitzeffalexander largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT mackenziecraigo largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT eramianhamed largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT dimaiofrank largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT grigoryangevorg largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT vaughnmatthew largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT stewartlancej largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT bakerdavid largescaledesignandrefinementofstableproteinsusingsequenceonlymodels AT klavinseric largescaledesignandrefinementofstableproteinsusingsequenceonlymodels |