Cargando…

Origin and evolution of gene families in Bacteria and Archaea

BACKGROUND: Comparison of complete genomes of Bacteria and Archaea shows that gene content varies considerably and that genomes evolve quite rapidly via gene duplication and deletion and horizontal gene transfer. We analyze a diverse set of 92 Bacteria and 79 Archaea in order to investigate the proc...

Descripción completa

Detalles Bibliográficos
Autores principales: Collins, R Eric, Merz, Hugh, Higgs, Paul G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283305/
https://www.ncbi.nlm.nih.gov/pubmed/22151831
http://dx.doi.org/10.1186/1471-2105-12-S9-S14
_version_ 1782224181115486208
author Collins, R Eric
Merz, Hugh
Higgs, Paul G
author_facet Collins, R Eric
Merz, Hugh
Higgs, Paul G
author_sort Collins, R Eric
collection PubMed
description BACKGROUND: Comparison of complete genomes of Bacteria and Archaea shows that gene content varies considerably and that genomes evolve quite rapidly via gene duplication and deletion and horizontal gene transfer. We analyze a diverse set of 92 Bacteria and 79 Archaea in order to investigate the processes governing the origin and evolution of families of related genes within genomes. RESULTS: Genes were clustered into related groups using similarity criteria derived from BLAST. Most clusters contained genes from only one or a small number of genomes, and relatively few core clusters were found that spanned all genomes. Gene clusters found in larger numbers of genomes tended to have larger numbers of genes per genome; however, clusters with unusually large numbers of genes per genome were found among both narrowly and widely distributed clusters. Larger genomes were found to have larger mean gene family sizes and a greater proportion of families of very large size. We used a model of birth, death, and innovation to predict the distribution of gene family sizes. The key parameter is r, the ratio of duplications to deletions. It was found that the model can give a good fit to the observed distribution only if there are several classes of genes with different values of r. The preferred model in most cases had three classes of genes. CONCLUSIONS: There appears to be a rapid rate of origination of new gene families within individual genomes. Most of these gene families are deleted before they spread to large numbers of genomes, which suggests that they may not be generally beneficial to the organisms. The family size distribution is best described by a large fraction of families that tend to have only one or two genes and a small fraction of families of multi-copy genes that are highly prone to duplication. Larger families occur more frequently in larger genomes, indicating higher r in these genomes, possibly due to a greater tolerance for non-beneficial gene duplicates. The smallest genomes contain very few multi-copy families, suggesting a high rate of deletion of all but the most beneficial genes in these genomes.
format Online
Article
Text
id pubmed-3283305
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32833052012-02-22 Origin and evolution of gene families in Bacteria and Archaea Collins, R Eric Merz, Hugh Higgs, Paul G BMC Bioinformatics Proceedings BACKGROUND: Comparison of complete genomes of Bacteria and Archaea shows that gene content varies considerably and that genomes evolve quite rapidly via gene duplication and deletion and horizontal gene transfer. We analyze a diverse set of 92 Bacteria and 79 Archaea in order to investigate the processes governing the origin and evolution of families of related genes within genomes. RESULTS: Genes were clustered into related groups using similarity criteria derived from BLAST. Most clusters contained genes from only one or a small number of genomes, and relatively few core clusters were found that spanned all genomes. Gene clusters found in larger numbers of genomes tended to have larger numbers of genes per genome; however, clusters with unusually large numbers of genes per genome were found among both narrowly and widely distributed clusters. Larger genomes were found to have larger mean gene family sizes and a greater proportion of families of very large size. We used a model of birth, death, and innovation to predict the distribution of gene family sizes. The key parameter is r, the ratio of duplications to deletions. It was found that the model can give a good fit to the observed distribution only if there are several classes of genes with different values of r. The preferred model in most cases had three classes of genes. CONCLUSIONS: There appears to be a rapid rate of origination of new gene families within individual genomes. Most of these gene families are deleted before they spread to large numbers of genomes, which suggests that they may not be generally beneficial to the organisms. The family size distribution is best described by a large fraction of families that tend to have only one or two genes and a small fraction of families of multi-copy genes that are highly prone to duplication. Larger families occur more frequently in larger genomes, indicating higher r in these genomes, possibly due to a greater tolerance for non-beneficial gene duplicates. The smallest genomes contain very few multi-copy families, suggesting a high rate of deletion of all but the most beneficial genes in these genomes. BioMed Central 2011-10-05 /pmc/articles/PMC3283305/ /pubmed/22151831 http://dx.doi.org/10.1186/1471-2105-12-S9-S14 Text en Copyright ©2011 Collins et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Collins, R Eric
Merz, Hugh
Higgs, Paul G
Origin and evolution of gene families in Bacteria and Archaea
title Origin and evolution of gene families in Bacteria and Archaea
title_full Origin and evolution of gene families in Bacteria and Archaea
title_fullStr Origin and evolution of gene families in Bacteria and Archaea
title_full_unstemmed Origin and evolution of gene families in Bacteria and Archaea
title_short Origin and evolution of gene families in Bacteria and Archaea
title_sort origin and evolution of gene families in bacteria and archaea
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283305/
https://www.ncbi.nlm.nih.gov/pubmed/22151831
http://dx.doi.org/10.1186/1471-2105-12-S9-S14
work_keys_str_mv AT collinsreric originandevolutionofgenefamiliesinbacteriaandarchaea
AT merzhugh originandevolutionofgenefamiliesinbacteriaandarchaea
AT higgspaulg originandevolutionofgenefamiliesinbacteriaandarchaea