Cargando…

Species-specific protein sequence and fold optimizations

BACKGROUND: An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible a...

Descripción completa

Detalles Bibliográficos
Autores principales: Dumontier, Michel, Michalickova, Katerina, Hogue, Christopher WV
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC139977/
https://www.ncbi.nlm.nih.gov/pubmed/12487631
http://dx.doi.org/10.1186/1471-2105-3-39
_version_ 1782120581220532224
author Dumontier, Michel
Michalickova, Katerina
Hogue, Christopher WV
author_facet Dumontier, Michel
Michalickova, Katerina
Hogue, Christopher WV
author_sort Dumontier, Michel
collection PubMed
description BACKGROUND: An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes. RESULTS: Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archae, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archae and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (C(G)) and folds (C(F)) for each of the complete genomes. C(F )achieved an average cross-validation success rate of 85 ± 8% whereas the C(G )detected 73 ± 9% species-specific sequences when competing against all other non-redundant C(G). Continuously updated results are available at . CONCLUSION: Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.
format Text
id pubmed-139977
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1399772003-01-23 Species-specific protein sequence and fold optimizations Dumontier, Michel Michalickova, Katerina Hogue, Christopher WV BMC Bioinformatics Research Article BACKGROUND: An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes. RESULTS: Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archae, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archae and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (C(G)) and folds (C(F)) for each of the complete genomes. C(F )achieved an average cross-validation success rate of 85 ± 8% whereas the C(G )detected 73 ± 9% species-specific sequences when competing against all other non-redundant C(G). Continuously updated results are available at . CONCLUSION: Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events. BioMed Central 2002-12-17 /pmc/articles/PMC139977/ /pubmed/12487631 http://dx.doi.org/10.1186/1471-2105-3-39 Text en Copyright © 2002 Dumontier et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Dumontier, Michel
Michalickova, Katerina
Hogue, Christopher WV
Species-specific protein sequence and fold optimizations
title Species-specific protein sequence and fold optimizations
title_full Species-specific protein sequence and fold optimizations
title_fullStr Species-specific protein sequence and fold optimizations
title_full_unstemmed Species-specific protein sequence and fold optimizations
title_short Species-specific protein sequence and fold optimizations
title_sort species-specific protein sequence and fold optimizations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC139977/
https://www.ncbi.nlm.nih.gov/pubmed/12487631
http://dx.doi.org/10.1186/1471-2105-3-39
work_keys_str_mv AT dumontiermichel speciesspecificproteinsequenceandfoldoptimizations
AT michalickovakaterina speciesspecificproteinsequenceandfoldoptimizations
AT hoguechristopherwv speciesspecificproteinsequenceandfoldoptimizations