Cargando…

Mega-scale experimental analysis of protein folding stability in biology and design

Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale(1). However, the energetics driving folding are invisible in these structures and remain largely unknown(2). The hidden thermodynamics of folding can drive disease(3,4),...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsuboyama, Kotaro, Dauparas, Justas, Chen, Jonathan, Laine, Elodie, Mohseni Behbahani, Yasser, Weinstein, Jonathan J., Mangan, Niall M., Ovchinnikov, Sergey, Rocklin, Gabriel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412457/
https://www.ncbi.nlm.nih.gov/pubmed/37468638
http://dx.doi.org/10.1038/s41586-023-06328-6
_version_ 1785086909146464256
author Tsuboyama, Kotaro
Dauparas, Justas
Chen, Jonathan
Laine, Elodie
Mohseni Behbahani, Yasser
Weinstein, Jonathan J.
Mangan, Niall M.
Ovchinnikov, Sergey
Rocklin, Gabriel J.
author_facet Tsuboyama, Kotaro
Dauparas, Justas
Chen, Jonathan
Laine, Elodie
Mohseni Behbahani, Yasser
Weinstein, Jonathan J.
Mangan, Niall M.
Ovchinnikov, Sergey
Rocklin, Gabriel J.
author_sort Tsuboyama, Kotaro
collection PubMed
description Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale(1). However, the energetics driving folding are invisible in these structures and remain largely unknown(2). The hidden thermodynamics of folding can drive disease(3,4), shape protein evolution(5–7) and guide protein engineering(8–10), and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40–72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.
format Online
Article
Text
id pubmed-10412457
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104124572023-08-11 Mega-scale experimental analysis of protein folding stability in biology and design Tsuboyama, Kotaro Dauparas, Justas Chen, Jonathan Laine, Elodie Mohseni Behbahani, Yasser Weinstein, Jonathan J. Mangan, Niall M. Ovchinnikov, Sergey Rocklin, Gabriel J. Nature Article Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale(1). However, the energetics driving folding are invisible in these structures and remain largely unknown(2). The hidden thermodynamics of folding can drive disease(3,4), shape protein evolution(5–7) and guide protein engineering(8–10), and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40–72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability. Nature Publishing Group UK 2023-07-19 2023 /pmc/articles/PMC10412457/ /pubmed/37468638 http://dx.doi.org/10.1038/s41586-023-06328-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Tsuboyama, Kotaro
Dauparas, Justas
Chen, Jonathan
Laine, Elodie
Mohseni Behbahani, Yasser
Weinstein, Jonathan J.
Mangan, Niall M.
Ovchinnikov, Sergey
Rocklin, Gabriel J.
Mega-scale experimental analysis of protein folding stability in biology and design
title Mega-scale experimental analysis of protein folding stability in biology and design
title_full Mega-scale experimental analysis of protein folding stability in biology and design
title_fullStr Mega-scale experimental analysis of protein folding stability in biology and design
title_full_unstemmed Mega-scale experimental analysis of protein folding stability in biology and design
title_short Mega-scale experimental analysis of protein folding stability in biology and design
title_sort mega-scale experimental analysis of protein folding stability in biology and design
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412457/
https://www.ncbi.nlm.nih.gov/pubmed/37468638
http://dx.doi.org/10.1038/s41586-023-06328-6
work_keys_str_mv AT tsuboyamakotaro megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT dauparasjustas megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT chenjonathan megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT laineelodie megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT mohsenibehbahaniyasser megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT weinsteinjonathanj megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT manganniallm megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT ovchinnikovsergey megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign
AT rocklingabrielj megascaleexperimentalanalysisofproteinfoldingstabilityinbiologyanddesign