Cargando…

Birth and death of protein domains: A simple model of evolution explains power law behavior

BACKGROUND: Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactio...

Descripción completa

Detalles Bibliográficos
Autores principales: Karev, Georgy P, Wolf, Yuri I, Rzhetsky, Andrey Y, Berezovskaya, Faina S, Koonin, Eugene V
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137606/
https://www.ncbi.nlm.nih.gov/pubmed/12379152
http://dx.doi.org/10.1186/1471-2148-2-18
_version_ 1782120457690939392
author Karev, Georgy P
Wolf, Yuri I
Rzhetsky, Andrey Y
Berezovskaya, Faina S
Koonin, Eugene V
author_facet Karev, Georgy P
Wolf, Yuri I
Rzhetsky, Andrey Y
Berezovskaya, Faina S
Koonin, Eugene V
author_sort Karev, Georgy P
collection PubMed
description BACKGROUND: Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution. RESULTS: A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes. CONCLUSIONS: We show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment.
format Text
id pubmed-137606
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1376062002-12-08 Birth and death of protein domains: A simple model of evolution explains power law behavior Karev, Georgy P Wolf, Yuri I Rzhetsky, Andrey Y Berezovskaya, Faina S Koonin, Eugene V BMC Evol Biol Research Article BACKGROUND: Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution. RESULTS: A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes. CONCLUSIONS: We show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment. BioMed Central 2002-10-14 /pmc/articles/PMC137606/ /pubmed/12379152 http://dx.doi.org/10.1186/1471-2148-2-18 Text en Copyright © 2002 Karev et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Karev, Georgy P
Wolf, Yuri I
Rzhetsky, Andrey Y
Berezovskaya, Faina S
Koonin, Eugene V
Birth and death of protein domains: A simple model of evolution explains power law behavior
title Birth and death of protein domains: A simple model of evolution explains power law behavior
title_full Birth and death of protein domains: A simple model of evolution explains power law behavior
title_fullStr Birth and death of protein domains: A simple model of evolution explains power law behavior
title_full_unstemmed Birth and death of protein domains: A simple model of evolution explains power law behavior
title_short Birth and death of protein domains: A simple model of evolution explains power law behavior
title_sort birth and death of protein domains: a simple model of evolution explains power law behavior
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137606/
https://www.ncbi.nlm.nih.gov/pubmed/12379152
http://dx.doi.org/10.1186/1471-2148-2-18
work_keys_str_mv AT karevgeorgyp birthanddeathofproteindomainsasimplemodelofevolutionexplainspowerlawbehavior
AT wolfyurii birthanddeathofproteindomainsasimplemodelofevolutionexplainspowerlawbehavior
AT rzhetskyandreyy birthanddeathofproteindomainsasimplemodelofevolutionexplainspowerlawbehavior
AT berezovskayafainas birthanddeathofproteindomainsasimplemodelofevolutionexplainspowerlawbehavior
AT koonineugenev birthanddeathofproteindomainsasimplemodelofevolutionexplainspowerlawbehavior