Cargando…

JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture

BACKGROUND: Whole-genome sequencing projects are rapidly producing an enormous number of new sequences. Consequently almost every family of proteins now contains hundreds of members. It has thus become necessary to develop tools, which classify protein sequences automatically and also quickly and re...

Descripción completa

Detalles Bibliográficos
Autores principales: Sperisen, Peter, Pagni, Marco
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208858/
https://www.ncbi.nlm.nih.gov/pubmed/16135248
http://dx.doi.org/10.1186/1471-2105-6-216
_version_ 1782124917405253632
author Sperisen, Peter
Pagni, Marco
author_facet Sperisen, Peter
Pagni, Marco
author_sort Sperisen, Peter
collection PubMed
description BACKGROUND: Whole-genome sequencing projects are rapidly producing an enormous number of new sequences. Consequently almost every family of proteins now contains hundreds of members. It has thus become necessary to develop tools, which classify protein sequences automatically and also quickly and reliably. The difficulty of this task is intimately linked to the mechanism by which protein sequences diverge, i.e. by simultaneous residue substitutions, insertions and/or deletions and whole domain reorganisations (duplications/swapping/fusion). RESULTS: Here we present a novel approach, which is based on random sampling of sub-sequences (probes) out of a set of input sequences. The probes are compared to the input sequences, after a normalisation step; the results are used to partition the input sequences into homogeneous groups of proteins. In addition, this method provides information on diagnostic parts of the proteins. The performance of this method is challenged by two data sets. The first one contains the sequences of prokaryotic lyases that could be arranged as a multiple sequence alignment. The second one contains all proteins from Swiss-Prot Release 36 with at least one Src homology 2 (SH2) domain – a classical example for proteins with modular architecture. CONCLUSION: The outcome of our method is robust, highly reproducible as shown using bootstrap and resampling validation procedures. The results are essentially coherent with the biology. This method depends solely on well-established publicly available software and algorithms.
format Text
id pubmed-1208858
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12088582005-09-15 JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture Sperisen, Peter Pagni, Marco BMC Bioinformatics Methodology Article BACKGROUND: Whole-genome sequencing projects are rapidly producing an enormous number of new sequences. Consequently almost every family of proteins now contains hundreds of members. It has thus become necessary to develop tools, which classify protein sequences automatically and also quickly and reliably. The difficulty of this task is intimately linked to the mechanism by which protein sequences diverge, i.e. by simultaneous residue substitutions, insertions and/or deletions and whole domain reorganisations (duplications/swapping/fusion). RESULTS: Here we present a novel approach, which is based on random sampling of sub-sequences (probes) out of a set of input sequences. The probes are compared to the input sequences, after a normalisation step; the results are used to partition the input sequences into homogeneous groups of proteins. In addition, this method provides information on diagnostic parts of the proteins. The performance of this method is challenged by two data sets. The first one contains the sequences of prokaryotic lyases that could be arranged as a multiple sequence alignment. The second one contains all proteins from Swiss-Prot Release 36 with at least one Src homology 2 (SH2) domain – a classical example for proteins with modular architecture. CONCLUSION: The outcome of our method is robust, highly reproducible as shown using bootstrap and resampling validation procedures. The results are essentially coherent with the biology. This method depends solely on well-established publicly available software and algorithms. BioMed Central 2005-08-31 /pmc/articles/PMC1208858/ /pubmed/16135248 http://dx.doi.org/10.1186/1471-2105-6-216 Text en Copyright © 2005 Sperisen and Pagni; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Sperisen, Peter
Pagni, Marco
JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
title JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
title_full JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
title_fullStr JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
title_full_unstemmed JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
title_short JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
title_sort jacop: a simple and robust method for the automated classification of protein sequences with modular architecture
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208858/
https://www.ncbi.nlm.nih.gov/pubmed/16135248
http://dx.doi.org/10.1186/1471-2105-6-216
work_keys_str_mv AT sperisenpeter jacopasimpleandrobustmethodfortheautomatedclassificationofproteinsequenceswithmodulararchitecture
AT pagnimarco jacopasimpleandrobustmethodfortheautomatedclassificationofproteinsequenceswithmodulararchitecture