Cargando…

On the Origin of Protein Superfamilies and Superfolds

Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein fa...

Descripción completa

Detalles Bibliográficos
Autores principales: Magner, Abram, Szpankowski, Wojciech, Kihara, Daisuke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4336935/
https://www.ncbi.nlm.nih.gov/pubmed/25703447
http://dx.doi.org/10.1038/srep08166
_version_ 1782358526536974336
author Magner, Abram
Szpankowski, Wojciech
Kihara, Daisuke
author_facet Magner, Abram
Szpankowski, Wojciech
Kihara, Daisuke
author_sort Magner, Abram
collection PubMed
description Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.
format Online
Article
Text
id pubmed-4336935
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-43369352015-03-02 On the Origin of Protein Superfamilies and Superfolds Magner, Abram Szpankowski, Wojciech Kihara, Daisuke Sci Rep Article Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions. Nature Publishing Group 2015-02-23 /pmc/articles/PMC4336935/ /pubmed/25703447 http://dx.doi.org/10.1038/srep08166 Text en Copyright © 2015, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-sa/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/
spellingShingle Article
Magner, Abram
Szpankowski, Wojciech
Kihara, Daisuke
On the Origin of Protein Superfamilies and Superfolds
title On the Origin of Protein Superfamilies and Superfolds
title_full On the Origin of Protein Superfamilies and Superfolds
title_fullStr On the Origin of Protein Superfamilies and Superfolds
title_full_unstemmed On the Origin of Protein Superfamilies and Superfolds
title_short On the Origin of Protein Superfamilies and Superfolds
title_sort on the origin of protein superfamilies and superfolds
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4336935/
https://www.ncbi.nlm.nih.gov/pubmed/25703447
http://dx.doi.org/10.1038/srep08166
work_keys_str_mv AT magnerabram ontheoriginofproteinsuperfamiliesandsuperfolds
AT szpankowskiwojciech ontheoriginofproteinsuperfamiliesandsuperfolds
AT kiharadaisuke ontheoriginofproteinsuperfamiliesandsuperfolds