Cargando…

Percolation in protein sequence space

The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with...

Descripción completa

Detalles Bibliográficos
Autores principales:	Buchholz, Patrick C. F., Fademrecht, Silvia, Pleiss, Jürgen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5738032/ https://www.ncbi.nlm.nih.gov/pubmed/29261740 http://dx.doi.org/10.1371/journal.pone.0189646

_version_	1783287615865225216
author	Buchholz, Patrick C. F. Fademrecht, Silvia Pleiss, Jürgen
author_facet	Buchholz, Patrick C. F. Fademrecht, Silvia Pleiss, Jürgen
author_sort	Buchholz, Patrick C. F.
collection	PubMed
description	The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with different fold and function, the cluster size distributions followed a power law with slopes between 2.4 and 3.3, which represent upper limits to the cluster distribution of extant sequences. The power law distribution of cluster sizes is in accordance with percolation theory and strongly supports connectedness of extant sequence space. Percolation of extant sequence space has three major consequences: (1) It transforms our view of sequence space as a highly connected network where each sequence has multiple neighbors, and each pair of sequences is connected by many different paths. A high degree of connectedness is a necessary condition of efficient evolution, because it overcomes the possible blockage by sign epistasis and reciprocal sign epistasis. (2) The Fisher exponent is an indicator of connectedness and saturation of sequence space of each protein superfamily. (3) All clusters are expected to be connected by extant sequences that become apparent as a higher portion of extant sequence space becomes known. Being linked to biochemically distinct homologous families, bridging sequences are promising enzyme candidates for applications in biotechnology because they are expected to have substrate ambiguity or catalytic promiscuity.
format	Online Article Text
id	pubmed-5738032
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-57380322017-12-29 Percolation in protein sequence space Buchholz, Patrick C. F. Fademrecht, Silvia Pleiss, Jürgen PLoS One Research Article The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with different fold and function, the cluster size distributions followed a power law with slopes between 2.4 and 3.3, which represent upper limits to the cluster distribution of extant sequences. The power law distribution of cluster sizes is in accordance with percolation theory and strongly supports connectedness of extant sequence space. Percolation of extant sequence space has three major consequences: (1) It transforms our view of sequence space as a highly connected network where each sequence has multiple neighbors, and each pair of sequences is connected by many different paths. A high degree of connectedness is a necessary condition of efficient evolution, because it overcomes the possible blockage by sign epistasis and reciprocal sign epistasis. (2) The Fisher exponent is an indicator of connectedness and saturation of sequence space of each protein superfamily. (3) All clusters are expected to be connected by extant sequences that become apparent as a higher portion of extant sequence space becomes known. Being linked to biochemically distinct homologous families, bridging sequences are promising enzyme candidates for applications in biotechnology because they are expected to have substrate ambiguity or catalytic promiscuity. Public Library of Science 2017-12-20 /pmc/articles/PMC5738032/ /pubmed/29261740 http://dx.doi.org/10.1371/journal.pone.0189646 Text en © 2017 Buchholz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Buchholz, Patrick C. F. Fademrecht, Silvia Pleiss, Jürgen Percolation in protein sequence space
title	Percolation in protein sequence space
title_full	Percolation in protein sequence space
title_fullStr	Percolation in protein sequence space
title_full_unstemmed	Percolation in protein sequence space
title_short	Percolation in protein sequence space
title_sort	percolation in protein sequence space
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5738032/ https://www.ncbi.nlm.nih.gov/pubmed/29261740 http://dx.doi.org/10.1371/journal.pone.0189646
work_keys_str_mv	AT buchholzpatrickcf percolationinproteinsequencespace AT fademrechtsilvia percolationinproteinsequencespace AT pleissjurgen percolationinproteinsequencespace

Percolation in protein sequence space

Ejemplares similares