Cargando…

Characterization of oligopeptide patterns in large protein sets

BACKGROUND: Recent sequencing projects and the growth of sequence data banks enable oligopeptide patterns to be characterized on a genome or kingdom level. Several studies have focused on kingdom or habitat classifications based on the abundance of short peptide patterns. There have also been effort...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bresell, Anders, Persson, Bengt
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2231379/ https://www.ncbi.nlm.nih.gov/pubmed/17908308 http://dx.doi.org/10.1186/1471-2164-8-346

_version_	1782150250189815808
author	Bresell, Anders Persson, Bengt
author_facet	Bresell, Anders Persson, Bengt
author_sort	Bresell, Anders
collection	PubMed
description	BACKGROUND: Recent sequencing projects and the growth of sequence data banks enable oligopeptide patterns to be characterized on a genome or kingdom level. Several studies have focused on kingdom or habitat classifications based on the abundance of short peptide patterns. There have also been efforts at local structural prediction based on short sequence motifs. Oligopeptide patterns undoubtedly carry valuable information content. Therefore, it is important to characterize these informational peptide patterns to shed light on possible new applications and the pitfalls implicit in neglecting bias in peptide patterns. RESULTS: We have studied four classes of pentapeptide patterns (designated POP, NEP, ORP and URP) in the kingdoms archaea, bacteria and eukaryotes. POP are highly abundant patterns statistically not expected to exist; NEP are patterns that do not exist but are statistically expected to; ORP are patterns unique to a kingdom; and URP are patterns excluded from a kingdom. We used two data sources: the de facto standard of protein knowledge Swiss-Prot, and a set of 386 completely sequenced genomes. For each class of peptides we looked at the 100 most extreme and found both known and unknown sequence features. Most of the known sequence motifs can be explained on the basis of the protein families from which they originate. CONCLUSION: We find an inherent bias of certain oligopeptide patterns in naturally occurring proteins that cannot be explained solely on the basis of residue distribution in single proteins, kingdoms or databases. We see three predominant categories of patterns: (i) patterns widespread in a kingdom such as those originating from respiratory chain-associated proteins and translation machinery; (ii) proteins with structurally and/or functionally favored patterns, which have not yet been ascribed this role; (iii) multicopy species-specific retrotransposons, only found in the genome set. These categories will affect the accuracy of sequence pattern algorithms that rely mainly on amino acid residue usage. Methods presented in this paper may be used to discover targets for antibiotics, as we identify numerous examples of kingdom-specific antigens among our peptide classes. The methods may also be useful for detecting coding regions of genes.
format	Text
id	pubmed-2231379
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22313792008-02-06 Characterization of oligopeptide patterns in large protein sets Bresell, Anders Persson, Bengt BMC Genomics Research Article BACKGROUND: Recent sequencing projects and the growth of sequence data banks enable oligopeptide patterns to be characterized on a genome or kingdom level. Several studies have focused on kingdom or habitat classifications based on the abundance of short peptide patterns. There have also been efforts at local structural prediction based on short sequence motifs. Oligopeptide patterns undoubtedly carry valuable information content. Therefore, it is important to characterize these informational peptide patterns to shed light on possible new applications and the pitfalls implicit in neglecting bias in peptide patterns. RESULTS: We have studied four classes of pentapeptide patterns (designated POP, NEP, ORP and URP) in the kingdoms archaea, bacteria and eukaryotes. POP are highly abundant patterns statistically not expected to exist; NEP are patterns that do not exist but are statistically expected to; ORP are patterns unique to a kingdom; and URP are patterns excluded from a kingdom. We used two data sources: the de facto standard of protein knowledge Swiss-Prot, and a set of 386 completely sequenced genomes. For each class of peptides we looked at the 100 most extreme and found both known and unknown sequence features. Most of the known sequence motifs can be explained on the basis of the protein families from which they originate. CONCLUSION: We find an inherent bias of certain oligopeptide patterns in naturally occurring proteins that cannot be explained solely on the basis of residue distribution in single proteins, kingdoms or databases. We see three predominant categories of patterns: (i) patterns widespread in a kingdom such as those originating from respiratory chain-associated proteins and translation machinery; (ii) proteins with structurally and/or functionally favored patterns, which have not yet been ascribed this role; (iii) multicopy species-specific retrotransposons, only found in the genome set. These categories will affect the accuracy of sequence pattern algorithms that rely mainly on amino acid residue usage. Methods presented in this paper may be used to discover targets for antibiotics, as we identify numerous examples of kingdom-specific antigens among our peptide classes. The methods may also be useful for detecting coding regions of genes. BioMed Central 2007-10-01 /pmc/articles/PMC2231379/ /pubmed/17908308 http://dx.doi.org/10.1186/1471-2164-8-346 Text en Copyright © 2007 Bresell and Persson.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Bresell, Anders Persson, Bengt Characterization of oligopeptide patterns in large protein sets
title	Characterization of oligopeptide patterns in large protein sets
title_full	Characterization of oligopeptide patterns in large protein sets
title_fullStr	Characterization of oligopeptide patterns in large protein sets
title_full_unstemmed	Characterization of oligopeptide patterns in large protein sets
title_short	Characterization of oligopeptide patterns in large protein sets
title_sort	characterization of oligopeptide patterns in large protein sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2231379/ https://www.ncbi.nlm.nih.gov/pubmed/17908308 http://dx.doi.org/10.1186/1471-2164-8-346
work_keys_str_mv	AT bresellanders characterizationofoligopeptidepatternsinlargeproteinsets AT perssonbengt characterizationofoligopeptidepatternsinlargeproteinsets

Characterization of oligopeptide patterns in large protein sets

Ejemplares similares