Cargando…

Estimation of prokaryotic supergenome size and composition from gene frequency distributions

BACKGROUND: Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome...

Descripción completa

Detalles Bibliográficos
Autores principales: Lobkovsky, Alexander E, Wolf, Yuri I, Koonin, Eugene V
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240607/
https://www.ncbi.nlm.nih.gov/pubmed/25572821
http://dx.doi.org/10.1186/1471-2164-15-S6-S14
_version_ 1782345742770241536
author Lobkovsky, Alexander E
Wolf, Yuri I
Koonin, Eugene V
author_facet Lobkovsky, Alexander E
Wolf, Yuri I
Koonin, Eugene V
author_sort Lobkovsky, Alexander E
collection PubMed
description BACKGROUND: Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions. RESULTS: We surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains. CONCLUSIONS: Supergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities.
format Online
Article
Text
id pubmed-4240607
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42406072014-11-25 Estimation of prokaryotic supergenome size and composition from gene frequency distributions Lobkovsky, Alexander E Wolf, Yuri I Koonin, Eugene V BMC Genomics Research BACKGROUND: Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions. RESULTS: We surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains. CONCLUSIONS: Supergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities. BioMed Central 2014-10-17 /pmc/articles/PMC4240607/ /pubmed/25572821 http://dx.doi.org/10.1186/1471-2164-15-S6-S14 Text en Copyright © 2014 Lobkovsky et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lobkovsky, Alexander E
Wolf, Yuri I
Koonin, Eugene V
Estimation of prokaryotic supergenome size and composition from gene frequency distributions
title Estimation of prokaryotic supergenome size and composition from gene frequency distributions
title_full Estimation of prokaryotic supergenome size and composition from gene frequency distributions
title_fullStr Estimation of prokaryotic supergenome size and composition from gene frequency distributions
title_full_unstemmed Estimation of prokaryotic supergenome size and composition from gene frequency distributions
title_short Estimation of prokaryotic supergenome size and composition from gene frequency distributions
title_sort estimation of prokaryotic supergenome size and composition from gene frequency distributions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240607/
https://www.ncbi.nlm.nih.gov/pubmed/25572821
http://dx.doi.org/10.1186/1471-2164-15-S6-S14
work_keys_str_mv AT lobkovskyalexandere estimationofprokaryoticsupergenomesizeandcompositionfromgenefrequencydistributions
AT wolfyurii estimationofprokaryoticsupergenomesizeandcompositionfromgenefrequencydistributions
AT koonineugenev estimationofprokaryoticsupergenomesizeandcompositionfromgenefrequencydistributions