Cargando…

The scale-free nature of protein sequence space

The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree...

Descripción completa

Detalles Bibliográficos
Autores principales: Buchholz, Patrick C. F., Zeil, Catharina, Pleiss, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070207/
https://www.ncbi.nlm.nih.gov/pubmed/30067815
http://dx.doi.org/10.1371/journal.pone.0200815
_version_ 1783343637213478912
author Buchholz, Patrick C. F.
Zeil, Catharina
Pleiss, Jürgen
author_facet Buchholz, Patrick C. F.
Zeil, Catharina
Pleiss, Jürgen
author_sort Buchholz, Patrick C. F.
collection PubMed
description The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension D(f) was distance-dependent: a high dimension for single and double mutants (D(f) = 4.0), which dropped to D(f) = 0.7–1.0 at 90% sequence identity, and increased to D(f) = 3.5–4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.
format Online
Article
Text
id pubmed-6070207
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60702072018-08-09 The scale-free nature of protein sequence space Buchholz, Patrick C. F. Zeil, Catharina Pleiss, Jürgen PLoS One Research Article The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension D(f) was distance-dependent: a high dimension for single and double mutants (D(f) = 4.0), which dropped to D(f) = 0.7–1.0 at 90% sequence identity, and increased to D(f) = 3.5–4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology. Public Library of Science 2018-08-01 /pmc/articles/PMC6070207/ /pubmed/30067815 http://dx.doi.org/10.1371/journal.pone.0200815 Text en © 2018 Buchholz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Buchholz, Patrick C. F.
Zeil, Catharina
Pleiss, Jürgen
The scale-free nature of protein sequence space
title The scale-free nature of protein sequence space
title_full The scale-free nature of protein sequence space
title_fullStr The scale-free nature of protein sequence space
title_full_unstemmed The scale-free nature of protein sequence space
title_short The scale-free nature of protein sequence space
title_sort scale-free nature of protein sequence space
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070207/
https://www.ncbi.nlm.nih.gov/pubmed/30067815
http://dx.doi.org/10.1371/journal.pone.0200815
work_keys_str_mv AT buchholzpatrickcf thescalefreenatureofproteinsequencespace
AT zeilcatharina thescalefreenatureofproteinsequencespace
AT pleissjurgen thescalefreenatureofproteinsequencespace
AT buchholzpatrickcf scalefreenatureofproteinsequencespace
AT zeilcatharina scalefreenatureofproteinsequencespace
AT pleissjurgen scalefreenatureofproteinsequencespace