Cargando…

Large-scale design and refinement of stable proteins using sequence-only models

Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order...

Descripción completa

Detalles Bibliográficos
Autores principales: Singer, Jedediah M., Novotney, Scott, Strickland, Devin, Haddox, Hugh K., Leiby, Nicholas, Rocklin, Gabriel J., Chow, Cameron M., Roy, Anindya, Bera, Asim K., Motta, Francis C., Cao, Longxing, Strauch, Eva-Maria, Chidyausiku, Tamuka M., Ford, Alex, Ho, Ethan, Zaitzeff, Alexander, Mackenzie, Craig O., Eramian, Hamed, DiMaio, Frank, Grigoryan, Gevorg, Vaughn, Matthew, Stewart, Lance J., Baker, David, Klavins, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8920274/
https://www.ncbi.nlm.nih.gov/pubmed/35286324
http://dx.doi.org/10.1371/journal.pone.0265020
_version_ 1784669092745052160
author Singer, Jedediah M.
Novotney, Scott
Strickland, Devin
Haddox, Hugh K.
Leiby, Nicholas
Rocklin, Gabriel J.
Chow, Cameron M.
Roy, Anindya
Bera, Asim K.
Motta, Francis C.
Cao, Longxing
Strauch, Eva-Maria
Chidyausiku, Tamuka M.
Ford, Alex
Ho, Ethan
Zaitzeff, Alexander
Mackenzie, Craig O.
Eramian, Hamed
DiMaio, Frank
Grigoryan, Gevorg
Vaughn, Matthew
Stewart, Lance J.
Baker, David
Klavins, Eric
author_facet Singer, Jedediah M.
Novotney, Scott
Strickland, Devin
Haddox, Hugh K.
Leiby, Nicholas
Rocklin, Gabriel J.
Chow, Cameron M.
Roy, Anindya
Bera, Asim K.
Motta, Francis C.
Cao, Longxing
Strauch, Eva-Maria
Chidyausiku, Tamuka M.
Ford, Alex
Ho, Ethan
Zaitzeff, Alexander
Mackenzie, Craig O.
Eramian, Hamed
DiMaio, Frank
Grigoryan, Gevorg
Vaughn, Matthew
Stewart, Lance J.
Baker, David
Klavins, Eric
author_sort Singer, Jedediah M.
collection PubMed
description Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model—despite weaknesses including a noisy data set—can be used to substantially increase the stability of both expert-designed and model-generated proteins.
format Online
Article
Text
id pubmed-8920274
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-89202742022-03-15 Large-scale design and refinement of stable proteins using sequence-only models Singer, Jedediah M. Novotney, Scott Strickland, Devin Haddox, Hugh K. Leiby, Nicholas Rocklin, Gabriel J. Chow, Cameron M. Roy, Anindya Bera, Asim K. Motta, Francis C. Cao, Longxing Strauch, Eva-Maria Chidyausiku, Tamuka M. Ford, Alex Ho, Ethan Zaitzeff, Alexander Mackenzie, Craig O. Eramian, Hamed DiMaio, Frank Grigoryan, Gevorg Vaughn, Matthew Stewart, Lance J. Baker, David Klavins, Eric PLoS One Research Article Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model—despite weaknesses including a noisy data set—can be used to substantially increase the stability of both expert-designed and model-generated proteins. Public Library of Science 2022-03-14 /pmc/articles/PMC8920274/ /pubmed/35286324 http://dx.doi.org/10.1371/journal.pone.0265020 Text en © 2022 Singer et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Singer, Jedediah M.
Novotney, Scott
Strickland, Devin
Haddox, Hugh K.
Leiby, Nicholas
Rocklin, Gabriel J.
Chow, Cameron M.
Roy, Anindya
Bera, Asim K.
Motta, Francis C.
Cao, Longxing
Strauch, Eva-Maria
Chidyausiku, Tamuka M.
Ford, Alex
Ho, Ethan
Zaitzeff, Alexander
Mackenzie, Craig O.
Eramian, Hamed
DiMaio, Frank
Grigoryan, Gevorg
Vaughn, Matthew
Stewart, Lance J.
Baker, David
Klavins, Eric
Large-scale design and refinement of stable proteins using sequence-only models
title Large-scale design and refinement of stable proteins using sequence-only models
title_full Large-scale design and refinement of stable proteins using sequence-only models
title_fullStr Large-scale design and refinement of stable proteins using sequence-only models
title_full_unstemmed Large-scale design and refinement of stable proteins using sequence-only models
title_short Large-scale design and refinement of stable proteins using sequence-only models
title_sort large-scale design and refinement of stable proteins using sequence-only models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8920274/
https://www.ncbi.nlm.nih.gov/pubmed/35286324
http://dx.doi.org/10.1371/journal.pone.0265020
work_keys_str_mv AT singerjedediahm largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT novotneyscott largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT stricklanddevin largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT haddoxhughk largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT leibynicholas largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT rocklingabrielj largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT chowcameronm largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT royanindya largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT beraasimk largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT mottafrancisc largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT caolongxing largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT strauchevamaria largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT chidyausikutamukam largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT fordalex largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT hoethan largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT zaitzeffalexander largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT mackenziecraigo largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT eramianhamed largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT dimaiofrank largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT grigoryangevorg largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT vaughnmatthew largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT stewartlancej largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT bakerdavid largescaledesignandrefinementofstableproteinsusingsequenceonlymodels
AT klavinseric largescaledesignandrefinementofstableproteinsusingsequenceonlymodels