Cargando…
The scale-free nature of protein sequence space
The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070207/ https://www.ncbi.nlm.nih.gov/pubmed/30067815 http://dx.doi.org/10.1371/journal.pone.0200815 |
_version_ | 1783343637213478912 |
---|---|
author | Buchholz, Patrick C. F. Zeil, Catharina Pleiss, Jürgen |
author_facet | Buchholz, Patrick C. F. Zeil, Catharina Pleiss, Jürgen |
author_sort | Buchholz, Patrick C. F. |
collection | PubMed |
description | The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension D(f) was distance-dependent: a high dimension for single and double mutants (D(f) = 4.0), which dropped to D(f) = 0.7–1.0 at 90% sequence identity, and increased to D(f) = 3.5–4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology. |
format | Online Article Text |
id | pubmed-6070207 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-60702072018-08-09 The scale-free nature of protein sequence space Buchholz, Patrick C. F. Zeil, Catharina Pleiss, Jürgen PLoS One Research Article The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension D(f) was distance-dependent: a high dimension for single and double mutants (D(f) = 4.0), which dropped to D(f) = 0.7–1.0 at 90% sequence identity, and increased to D(f) = 3.5–4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology. Public Library of Science 2018-08-01 /pmc/articles/PMC6070207/ /pubmed/30067815 http://dx.doi.org/10.1371/journal.pone.0200815 Text en © 2018 Buchholz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Buchholz, Patrick C. F. Zeil, Catharina Pleiss, Jürgen The scale-free nature of protein sequence space |
title | The scale-free nature of protein sequence space |
title_full | The scale-free nature of protein sequence space |
title_fullStr | The scale-free nature of protein sequence space |
title_full_unstemmed | The scale-free nature of protein sequence space |
title_short | The scale-free nature of protein sequence space |
title_sort | scale-free nature of protein sequence space |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070207/ https://www.ncbi.nlm.nih.gov/pubmed/30067815 http://dx.doi.org/10.1371/journal.pone.0200815 |
work_keys_str_mv | AT buchholzpatrickcf thescalefreenatureofproteinsequencespace AT zeilcatharina thescalefreenatureofproteinsequencespace AT pleissjurgen thescalefreenatureofproteinsequencespace AT buchholzpatrickcf scalefreenatureofproteinsequencespace AT zeilcatharina scalefreenatureofproteinsequencespace AT pleissjurgen scalefreenatureofproteinsequencespace |