Cargando…

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the prot...

Descripción completa

Detalles Bibliográficos
Autores principales: Marsden, Russell L., Lee, David, Maibaum, Michael, Yeats, Corin, Orengo, Christine A.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1373602/
https://www.ncbi.nlm.nih.gov/pubmed/16481312
http://dx.doi.org/10.1093/nar/gkj494
_version_ 1782126795858903040
author Marsden, Russell L.
Lee, David
Maibaum, Michael
Yeats, Corin
Orengo, Christine A.
author_facet Marsden, Russell L.
Lee, David
Maibaum, Michael
Yeats, Corin
Orengo, Christine A.
author_sort Marsden, Russell L.
collection PubMed
description We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.
format Text
id pubmed-1373602
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-13736022006-02-17 Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Marsden, Russell L. Lee, David Maibaum, Michael Yeats, Corin Orengo, Christine A. Nucleic Acids Res Article We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves. Oxford University Press 2006 2006-02-15 /pmc/articles/PMC1373602/ /pubmed/16481312 http://dx.doi.org/10.1093/nar/gkj494 Text en © The Author 2006. Published by Oxford University Press. All rights reserved
spellingShingle Article
Marsden, Russell L.
Lee, David
Maibaum, Michael
Yeats, Corin
Orengo, Christine A.
Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
title Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
title_full Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
title_fullStr Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
title_full_unstemmed Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
title_short Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
title_sort comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1373602/
https://www.ncbi.nlm.nih.gov/pubmed/16481312
http://dx.doi.org/10.1093/nar/gkj494
work_keys_str_mv AT marsdenrusselll comprehensivegenomeanalysisof203genomesprovidesstructuralgenomicswithnewinsightsintoproteinfamilyspace
AT leedavid comprehensivegenomeanalysisof203genomesprovidesstructuralgenomicswithnewinsightsintoproteinfamilyspace
AT maibaummichael comprehensivegenomeanalysisof203genomesprovidesstructuralgenomicswithnewinsightsintoproteinfamilyspace
AT yeatscorin comprehensivegenomeanalysisof203genomesprovidesstructuralgenomicswithnewinsightsintoproteinfamilyspace
AT orengochristinea comprehensivegenomeanalysisof203genomesprovidesstructuralgenomicswithnewinsightsintoproteinfamilyspace