Cargando…

Characterization of oligopeptide patterns in large protein sets

BACKGROUND: Recent sequencing projects and the growth of sequence data banks enable oligopeptide patterns to be characterized on a genome or kingdom level. Several studies have focused on kingdom or habitat classifications based on the abundance of short peptide patterns. There have also been effort...

Descripción completa

Detalles Bibliográficos
Autores principales: Bresell, Anders, Persson, Bengt
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2231379/
https://www.ncbi.nlm.nih.gov/pubmed/17908308
http://dx.doi.org/10.1186/1471-2164-8-346
_version_ 1782150250189815808
author Bresell, Anders
Persson, Bengt
author_facet Bresell, Anders
Persson, Bengt
author_sort Bresell, Anders
collection PubMed
description BACKGROUND: Recent sequencing projects and the growth of sequence data banks enable oligopeptide patterns to be characterized on a genome or kingdom level. Several studies have focused on kingdom or habitat classifications based on the abundance of short peptide patterns. There have also been efforts at local structural prediction based on short sequence motifs. Oligopeptide patterns undoubtedly carry valuable information content. Therefore, it is important to characterize these informational peptide patterns to shed light on possible new applications and the pitfalls implicit in neglecting bias in peptide patterns. RESULTS: We have studied four classes of pentapeptide patterns (designated POP, NEP, ORP and URP) in the kingdoms archaea, bacteria and eukaryotes. POP are highly abundant patterns statistically not expected to exist; NEP are patterns that do not exist but are statistically expected to; ORP are patterns unique to a kingdom; and URP are patterns excluded from a kingdom. We used two data sources: the de facto standard of protein knowledge Swiss-Prot, and a set of 386 completely sequenced genomes. For each class of peptides we looked at the 100 most extreme and found both known and unknown sequence features. Most of the known sequence motifs can be explained on the basis of the protein families from which they originate. CONCLUSION: We find an inherent bias of certain oligopeptide patterns in naturally occurring proteins that cannot be explained solely on the basis of residue distribution in single proteins, kingdoms or databases. We see three predominant categories of patterns: (i) patterns widespread in a kingdom such as those originating from respiratory chain-associated proteins and translation machinery; (ii) proteins with structurally and/or functionally favored patterns, which have not yet been ascribed this role; (iii) multicopy species-specific retrotransposons, only found in the genome set. These categories will affect the accuracy of sequence pattern algorithms that rely mainly on amino acid residue usage. Methods presented in this paper may be used to discover targets for antibiotics, as we identify numerous examples of kingdom-specific antigens among our peptide classes. The methods may also be useful for detecting coding regions of genes.
format Text
id pubmed-2231379
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22313792008-02-06 Characterization of oligopeptide patterns in large protein sets Bresell, Anders Persson, Bengt BMC Genomics Research Article BACKGROUND: Recent sequencing projects and the growth of sequence data banks enable oligopeptide patterns to be characterized on a genome or kingdom level. Several studies have focused on kingdom or habitat classifications based on the abundance of short peptide patterns. There have also been efforts at local structural prediction based on short sequence motifs. Oligopeptide patterns undoubtedly carry valuable information content. Therefore, it is important to characterize these informational peptide patterns to shed light on possible new applications and the pitfalls implicit in neglecting bias in peptide patterns. RESULTS: We have studied four classes of pentapeptide patterns (designated POP, NEP, ORP and URP) in the kingdoms archaea, bacteria and eukaryotes. POP are highly abundant patterns statistically not expected to exist; NEP are patterns that do not exist but are statistically expected to; ORP are patterns unique to a kingdom; and URP are patterns excluded from a kingdom. We used two data sources: the de facto standard of protein knowledge Swiss-Prot, and a set of 386 completely sequenced genomes. For each class of peptides we looked at the 100 most extreme and found both known and unknown sequence features. Most of the known sequence motifs can be explained on the basis of the protein families from which they originate. CONCLUSION: We find an inherent bias of certain oligopeptide patterns in naturally occurring proteins that cannot be explained solely on the basis of residue distribution in single proteins, kingdoms or databases. We see three predominant categories of patterns: (i) patterns widespread in a kingdom such as those originating from respiratory chain-associated proteins and translation machinery; (ii) proteins with structurally and/or functionally favored patterns, which have not yet been ascribed this role; (iii) multicopy species-specific retrotransposons, only found in the genome set. These categories will affect the accuracy of sequence pattern algorithms that rely mainly on amino acid residue usage. Methods presented in this paper may be used to discover targets for antibiotics, as we identify numerous examples of kingdom-specific antigens among our peptide classes. The methods may also be useful for detecting coding regions of genes. BioMed Central 2007-10-01 /pmc/articles/PMC2231379/ /pubmed/17908308 http://dx.doi.org/10.1186/1471-2164-8-346 Text en Copyright © 2007 Bresell and Persson.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bresell, Anders
Persson, Bengt
Characterization of oligopeptide patterns in large protein sets
title Characterization of oligopeptide patterns in large protein sets
title_full Characterization of oligopeptide patterns in large protein sets
title_fullStr Characterization of oligopeptide patterns in large protein sets
title_full_unstemmed Characterization of oligopeptide patterns in large protein sets
title_short Characterization of oligopeptide patterns in large protein sets
title_sort characterization of oligopeptide patterns in large protein sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2231379/
https://www.ncbi.nlm.nih.gov/pubmed/17908308
http://dx.doi.org/10.1186/1471-2164-8-346
work_keys_str_mv AT bresellanders characterizationofoligopeptidepatternsinlargeproteinsets
AT perssonbengt characterizationofoligopeptidepatternsinlargeproteinsets