Cargando…

Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea

BACKGROUND: An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became availab...

Descripción completa

Detalles Bibliográficos
Autores principales: Makarova, Kira S, Sorokin, Alexander V, Novichkov, Pavel S, Wolf, Yuri I, Koonin, Eugene V
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222616/
https://www.ncbi.nlm.nih.gov/pubmed/18042280
http://dx.doi.org/10.1186/1745-6150-2-33
_version_ 1782149356731760640
author Makarova, Kira S
Sorokin, Alexander V
Novichkov, Pavel S
Wolf, Yuri I
Koonin, Eugene V
author_facet Makarova, Kira S
Sorokin, Alexander V
Novichkov, Pavel S
Wolf, Yuri I
Koonin, Eugene V
author_sort Makarova, Kira S
collection PubMed
description BACKGROUND: An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. RESULTS: New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. CONCLUSION: The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: . REVIEWERS: This article was reviewed by Peer Bork, Patrick Forterre, and Purificacion Lopez-Garcia.
format Text
id pubmed-2222616
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22226162008-02-01 Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea Makarova, Kira S Sorokin, Alexander V Novichkov, Pavel S Wolf, Yuri I Koonin, Eugene V Biol Direct Research BACKGROUND: An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. RESULTS: New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. CONCLUSION: The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: . REVIEWERS: This article was reviewed by Peer Bork, Patrick Forterre, and Purificacion Lopez-Garcia. BioMed Central 2007-11-27 /pmc/articles/PMC2222616/ /pubmed/18042280 http://dx.doi.org/10.1186/1745-6150-2-33 Text en Copyright © 2007 Makarova et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Makarova, Kira S
Sorokin, Alexander V
Novichkov, Pavel S
Wolf, Yuri I
Koonin, Eugene V
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
title Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
title_full Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
title_fullStr Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
title_full_unstemmed Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
title_short Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
title_sort clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222616/
https://www.ncbi.nlm.nih.gov/pubmed/18042280
http://dx.doi.org/10.1186/1745-6150-2-33
work_keys_str_mv AT makarovakiras clustersoforthologousgenesfor41archaealgenomesandimplicationsforevolutionarygenomicsofarchaea
AT sorokinalexanderv clustersoforthologousgenesfor41archaealgenomesandimplicationsforevolutionarygenomicsofarchaea
AT novichkovpavels clustersoforthologousgenesfor41archaealgenomesandimplicationsforevolutionarygenomicsofarchaea
AT wolfyurii clustersoforthologousgenesfor41archaealgenomesandimplicationsforevolutionarygenomicsofarchaea
AT koonineugenev clustersoforthologousgenesfor41archaealgenomesandimplicationsforevolutionarygenomicsofarchaea