Cargando…

Sampling of structure and sequence space of small protein folds

Nature only samples a small fraction of the sequence space that can fold into stable proteins. Furthermore, small structural variations in a single fold, sometimes only a few amino acids, can define a protein’s molecular function. Hence, to design proteins with novel functionalities, such as molecul...

Descripción completa

Detalles Bibliográficos
Autores principales: Linsky, Thomas W., Noble, Kyle, Tobin, Autumn R., Crow, Rachel, Carter, Lauren, Urbauer, Jeffrey L., Baker, David, Strauch, Eva-Maria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9684540/
https://www.ncbi.nlm.nih.gov/pubmed/36418330
http://dx.doi.org/10.1038/s41467-022-34937-8
_version_ 1784835309420150784
author Linsky, Thomas W.
Noble, Kyle
Tobin, Autumn R.
Crow, Rachel
Carter, Lauren
Urbauer, Jeffrey L.
Baker, David
Strauch, Eva-Maria
author_facet Linsky, Thomas W.
Noble, Kyle
Tobin, Autumn R.
Crow, Rachel
Carter, Lauren
Urbauer, Jeffrey L.
Baker, David
Strauch, Eva-Maria
author_sort Linsky, Thomas W.
collection PubMed
description Nature only samples a small fraction of the sequence space that can fold into stable proteins. Furthermore, small structural variations in a single fold, sometimes only a few amino acids, can define a protein’s molecular function. Hence, to design proteins with novel functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of small protein folds while sampling shape diversity. We designed and evaluated stability of about 30,000 de novo protein designs of eight different folds. Among these designs, about 6,200 stable proteins were identified, including some predicted to have a first-of-its-kind minimalized thioredoxin fold. Obtained data revealed protein folding rules for structural features such as helix-connecting loops. Beyond serving as a resource for protein engineering, this massive and diverse dataset also provides training data for machine learning. We developed an accurate classifier to predict the stability of our designed proteins. The methods and the wide range of protein shapes provide a basis for designing new protein functions without compromising stability.
format Online
Article
Text
id pubmed-9684540
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96845402022-11-25 Sampling of structure and sequence space of small protein folds Linsky, Thomas W. Noble, Kyle Tobin, Autumn R. Crow, Rachel Carter, Lauren Urbauer, Jeffrey L. Baker, David Strauch, Eva-Maria Nat Commun Article Nature only samples a small fraction of the sequence space that can fold into stable proteins. Furthermore, small structural variations in a single fold, sometimes only a few amino acids, can define a protein’s molecular function. Hence, to design proteins with novel functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of small protein folds while sampling shape diversity. We designed and evaluated stability of about 30,000 de novo protein designs of eight different folds. Among these designs, about 6,200 stable proteins were identified, including some predicted to have a first-of-its-kind minimalized thioredoxin fold. Obtained data revealed protein folding rules for structural features such as helix-connecting loops. Beyond serving as a resource for protein engineering, this massive and diverse dataset also provides training data for machine learning. We developed an accurate classifier to predict the stability of our designed proteins. The methods and the wide range of protein shapes provide a basis for designing new protein functions without compromising stability. Nature Publishing Group UK 2022-11-22 /pmc/articles/PMC9684540/ /pubmed/36418330 http://dx.doi.org/10.1038/s41467-022-34937-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Linsky, Thomas W.
Noble, Kyle
Tobin, Autumn R.
Crow, Rachel
Carter, Lauren
Urbauer, Jeffrey L.
Baker, David
Strauch, Eva-Maria
Sampling of structure and sequence space of small protein folds
title Sampling of structure and sequence space of small protein folds
title_full Sampling of structure and sequence space of small protein folds
title_fullStr Sampling of structure and sequence space of small protein folds
title_full_unstemmed Sampling of structure and sequence space of small protein folds
title_short Sampling of structure and sequence space of small protein folds
title_sort sampling of structure and sequence space of small protein folds
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9684540/
https://www.ncbi.nlm.nih.gov/pubmed/36418330
http://dx.doi.org/10.1038/s41467-022-34937-8
work_keys_str_mv AT linskythomasw samplingofstructureandsequencespaceofsmallproteinfolds
AT noblekyle samplingofstructureandsequencespaceofsmallproteinfolds
AT tobinautumnr samplingofstructureandsequencespaceofsmallproteinfolds
AT crowrachel samplingofstructureandsequencespaceofsmallproteinfolds
AT carterlauren samplingofstructureandsequencespaceofsmallproteinfolds
AT urbauerjeffreyl samplingofstructureandsequencespaceofsmallproteinfolds
AT bakerdavid samplingofstructureandsequencespaceofsmallproteinfolds
AT strauchevamaria samplingofstructureandsequencespaceofsmallproteinfolds