Cargando…

Sequence statistics of tertiary structural motifs reflect protein stability

The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distrib...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zheng, Fan, Grigoryan, Gevorg
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446159/ https://www.ncbi.nlm.nih.gov/pubmed/28552940 http://dx.doi.org/10.1371/journal.pone.0178272

_version_	1783239017018425344
author	Zheng, Fan Grigoryan, Gevorg
author_facet	Zheng, Fan Grigoryan, Gevorg
author_sort	Zheng, Fan
collection	PubMed
description	The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to deduce quantitative sequence-structure relationships in the context of precisely-defined complex structural motifs by mining the PDB for closely matching backbone geometries. To validate this idea, we turned to the the task of predicting changes in protein stability upon amino-acid substitution—a difficult problem of broad significance. We defined non-contiguous tertiary motifs (TERMs) around a protein site of interest and extracted sequence preferences from ensembles of closely-matching substructures in the PDB to predict mutational stability changes at the site, ΔΔG(m). We demonstrate that these ensemble statistics predict ΔΔG(m) on par with state-of-the-art statistical and machine-learning methods on large thermodynamic datasets, and outperform these, along with a leading structure-based modeling approach, when tested in the context of unbiased diverse mutations. Further, we show that the performance of the TERM-based method is directly related to the amount of available relevant structural data, automatically improving with the growing PDB. This enables a means of estimating prediction accuracy. Our results clearly demonstrate that: 1) statistics of non-contiguous structural motifs in the PDB encode fundamental sequence-structure relationships related to protein thermodynamic stability, and 2) the PDB is now large enough that such statistics are already useful in practice, with their accuracy expected to continue increasing as the database grows. These observations suggest new ways of using structural data towards addressing problems of computational structural biology.
format	Online Article Text
id	pubmed-5446159
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-54461592017-06-12 Sequence statistics of tertiary structural motifs reflect protein stability Zheng, Fan Grigoryan, Gevorg PLoS One Research Article The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to deduce quantitative sequence-structure relationships in the context of precisely-defined complex structural motifs by mining the PDB for closely matching backbone geometries. To validate this idea, we turned to the the task of predicting changes in protein stability upon amino-acid substitution—a difficult problem of broad significance. We defined non-contiguous tertiary motifs (TERMs) around a protein site of interest and extracted sequence preferences from ensembles of closely-matching substructures in the PDB to predict mutational stability changes at the site, ΔΔG(m). We demonstrate that these ensemble statistics predict ΔΔG(m) on par with state-of-the-art statistical and machine-learning methods on large thermodynamic datasets, and outperform these, along with a leading structure-based modeling approach, when tested in the context of unbiased diverse mutations. Further, we show that the performance of the TERM-based method is directly related to the amount of available relevant structural data, automatically improving with the growing PDB. This enables a means of estimating prediction accuracy. Our results clearly demonstrate that: 1) statistics of non-contiguous structural motifs in the PDB encode fundamental sequence-structure relationships related to protein thermodynamic stability, and 2) the PDB is now large enough that such statistics are already useful in practice, with their accuracy expected to continue increasing as the database grows. These observations suggest new ways of using structural data towards addressing problems of computational structural biology. Public Library of Science 2017-05-26 /pmc/articles/PMC5446159/ /pubmed/28552940 http://dx.doi.org/10.1371/journal.pone.0178272 Text en © 2017 Zheng, Grigoryan http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zheng, Fan Grigoryan, Gevorg Sequence statistics of tertiary structural motifs reflect protein stability
title	Sequence statistics of tertiary structural motifs reflect protein stability
title_full	Sequence statistics of tertiary structural motifs reflect protein stability
title_fullStr	Sequence statistics of tertiary structural motifs reflect protein stability
title_full_unstemmed	Sequence statistics of tertiary structural motifs reflect protein stability
title_short	Sequence statistics of tertiary structural motifs reflect protein stability
title_sort	sequence statistics of tertiary structural motifs reflect protein stability
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446159/ https://www.ncbi.nlm.nih.gov/pubmed/28552940 http://dx.doi.org/10.1371/journal.pone.0178272
work_keys_str_mv	AT zhengfan sequencestatisticsoftertiarystructuralmotifsreflectproteinstability AT grigoryangevorg sequencestatisticsoftertiarystructuralmotifsreflectproteinstability

Sequence statistics of tertiary structural motifs reflect protein stability

Ejemplares similares