Cargando…

Structural characterization of genomes by large scale sequence-structure threading

BACKGROUND: Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms (including worm, fly, mouse and human) totaling 167,888 genes. RESULTS: The reported data represent first rather general evaluation of...

Descripción completa

Detalles Bibliográficos
Autores principales: Cherkasov, Artem, Jones, Steven JM
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC419331/
https://www.ncbi.nlm.nih.gov/pubmed/15061866
http://dx.doi.org/10.1186/1471-2105-5-37
_version_ 1782121429088600064
author Cherkasov, Artem
Jones, Steven JM
author_facet Cherkasov, Artem
Jones, Steven JM
author_sort Cherkasov, Artem
collection PubMed
description BACKGROUND: Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms (including worm, fly, mouse and human) totaling 167,888 genes. RESULTS: The reported data represent first rather general evaluation of performance of full sequence-structure threading on multiple genomes providing opportunity to evaluate its general applicability for large scale studies. According to the estimated results the sequence-structure threading has assigned protein folds to more then 60% of eukaryotic, 68% of archaeal and 70% of bacterial proteomes. The repertoires of protein classes, architectures, topologies and homologous superfamilies (according to the CATH 2.4 classification) have been established for distant organisms and superkingdoms. It has been found that the average abundance of CATH classes decreases from "alpha and beta" to "mainly beta", followed by "mainly alpha" and "few secondary structures". 3-Layer (aba) Sandwich has been characterized as the most abundant protein architecture and Rossman fold as the most common topology. CONCLUSION: The analysis of genomic occurrences of CATH 2.4 protein homologous superfamilies and topologies has revealed the power-law character of their distributions. The corresponding double logarithmic "frequency – genomic occurrence" dependences characteristic of scale-free systems have been established for individual organisms and for three superkingdoms. Supplementary materials to this works are available at [1].
format Text
id pubmed-419331
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4193312004-05-26 Structural characterization of genomes by large scale sequence-structure threading Cherkasov, Artem Jones, Steven JM BMC Bioinformatics Research Article BACKGROUND: Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms (including worm, fly, mouse and human) totaling 167,888 genes. RESULTS: The reported data represent first rather general evaluation of performance of full sequence-structure threading on multiple genomes providing opportunity to evaluate its general applicability for large scale studies. According to the estimated results the sequence-structure threading has assigned protein folds to more then 60% of eukaryotic, 68% of archaeal and 70% of bacterial proteomes. The repertoires of protein classes, architectures, topologies and homologous superfamilies (according to the CATH 2.4 classification) have been established for distant organisms and superkingdoms. It has been found that the average abundance of CATH classes decreases from "alpha and beta" to "mainly beta", followed by "mainly alpha" and "few secondary structures". 3-Layer (aba) Sandwich has been characterized as the most abundant protein architecture and Rossman fold as the most common topology. CONCLUSION: The analysis of genomic occurrences of CATH 2.4 protein homologous superfamilies and topologies has revealed the power-law character of their distributions. The corresponding double logarithmic "frequency – genomic occurrence" dependences characteristic of scale-free systems have been established for individual organisms and for three superkingdoms. Supplementary materials to this works are available at [1]. BioMed Central 2004-04-03 /pmc/articles/PMC419331/ /pubmed/15061866 http://dx.doi.org/10.1186/1471-2105-5-37 Text en Copyright © 2004 Cherkasov and Jones; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Cherkasov, Artem
Jones, Steven JM
Structural characterization of genomes by large scale sequence-structure threading
title Structural characterization of genomes by large scale sequence-structure threading
title_full Structural characterization of genomes by large scale sequence-structure threading
title_fullStr Structural characterization of genomes by large scale sequence-structure threading
title_full_unstemmed Structural characterization of genomes by large scale sequence-structure threading
title_short Structural characterization of genomes by large scale sequence-structure threading
title_sort structural characterization of genomes by large scale sequence-structure threading
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC419331/
https://www.ncbi.nlm.nih.gov/pubmed/15061866
http://dx.doi.org/10.1186/1471-2105-5-37
work_keys_str_mv AT cherkasovartem structuralcharacterizationofgenomesbylargescalesequencestructurethreading
AT jonesstevenjm structuralcharacterizationofgenomesbylargescalesequencestructurethreading