Cargando…

Size and structure of the sequence space of repeat proteins

The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Marchi, Jacopo, Galpern, Ezequiel A., Espada, Rocio, Ferreiro, Diego U., Walczak, Aleksandra M., Mora, Thierry
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6733475/
https://www.ncbi.nlm.nih.gov/pubmed/31415557
http://dx.doi.org/10.1371/journal.pcbi.1007282
_version_ 1783449991673544704
author Marchi, Jacopo
Galpern, Ezequiel A.
Espada, Rocio
Ferreiro, Diego U.
Walczak, Aleksandra M.
Mora, Thierry
author_facet Marchi, Jacopo
Galpern, Ezequiel A.
Espada, Rocio
Ferreiro, Diego U.
Walczak, Aleksandra M.
Mora, Thierry
author_sort Marchi, Jacopo
collection PubMed
description The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.
format Online
Article
Text
id pubmed-6733475
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67334752019-09-20 Size and structure of the sequence space of repeat proteins Marchi, Jacopo Galpern, Ezequiel A. Espada, Rocio Ferreiro, Diego U. Walczak, Aleksandra M. Mora, Thierry PLoS Comput Biol Research Article The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. Public Library of Science 2019-08-15 /pmc/articles/PMC6733475/ /pubmed/31415557 http://dx.doi.org/10.1371/journal.pcbi.1007282 Text en © 2019 Marchi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Marchi, Jacopo
Galpern, Ezequiel A.
Espada, Rocio
Ferreiro, Diego U.
Walczak, Aleksandra M.
Mora, Thierry
Size and structure of the sequence space of repeat proteins
title Size and structure of the sequence space of repeat proteins
title_full Size and structure of the sequence space of repeat proteins
title_fullStr Size and structure of the sequence space of repeat proteins
title_full_unstemmed Size and structure of the sequence space of repeat proteins
title_short Size and structure of the sequence space of repeat proteins
title_sort size and structure of the sequence space of repeat proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6733475/
https://www.ncbi.nlm.nih.gov/pubmed/31415557
http://dx.doi.org/10.1371/journal.pcbi.1007282
work_keys_str_mv AT marchijacopo sizeandstructureofthesequencespaceofrepeatproteins
AT galpernezequiela sizeandstructureofthesequencespaceofrepeatproteins
AT espadarocio sizeandstructureofthesequencespaceofrepeatproteins
AT ferreirodiegou sizeandstructureofthesequencespaceofrepeatproteins
AT walczakaleksandram sizeandstructureofthesequencespaceofrepeatproteins
AT morathierry sizeandstructureofthesequencespaceofrepeatproteins