Cargando…
On the Origin of Protein Superfamilies and Superfolds
Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein fa...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4336935/ https://www.ncbi.nlm.nih.gov/pubmed/25703447 http://dx.doi.org/10.1038/srep08166 |
_version_ | 1782358526536974336 |
---|---|
author | Magner, Abram Szpankowski, Wojciech Kihara, Daisuke |
author_facet | Magner, Abram Szpankowski, Wojciech Kihara, Daisuke |
author_sort | Magner, Abram |
collection | PubMed |
description | Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions. |
format | Online Article Text |
id | pubmed-4336935 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-43369352015-03-02 On the Origin of Protein Superfamilies and Superfolds Magner, Abram Szpankowski, Wojciech Kihara, Daisuke Sci Rep Article Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions. Nature Publishing Group 2015-02-23 /pmc/articles/PMC4336935/ /pubmed/25703447 http://dx.doi.org/10.1038/srep08166 Text en Copyright © 2015, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-sa/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ |
spellingShingle | Article Magner, Abram Szpankowski, Wojciech Kihara, Daisuke On the Origin of Protein Superfamilies and Superfolds |
title | On the Origin of Protein Superfamilies and Superfolds |
title_full | On the Origin of Protein Superfamilies and Superfolds |
title_fullStr | On the Origin of Protein Superfamilies and Superfolds |
title_full_unstemmed | On the Origin of Protein Superfamilies and Superfolds |
title_short | On the Origin of Protein Superfamilies and Superfolds |
title_sort | on the origin of protein superfamilies and superfolds |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4336935/ https://www.ncbi.nlm.nih.gov/pubmed/25703447 http://dx.doi.org/10.1038/srep08166 |
work_keys_str_mv | AT magnerabram ontheoriginofproteinsuperfamiliesandsuperfolds AT szpankowskiwojciech ontheoriginofproteinsuperfamiliesandsuperfolds AT kiharadaisuke ontheoriginofproteinsuperfamiliesandsuperfolds |