Cargando…

Classification of protein sequences by means of irredundant patterns

BACKGROUND: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Comin, Matteo, Verzotto, Davide
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009487/ https://www.ncbi.nlm.nih.gov/pubmed/20122187 http://dx.doi.org/10.1186/1471-2105-11-S1-S16

_version_	1782194689635516416
author	Comin, Matteo Verzotto, Davide
author_facet	Comin, Matteo Verzotto, Davide
author_sort	Comin, Matteo
collection	PubMed
description	BACKGROUND: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent, " and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence. RESULTS: In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines. CONCLUSION: Tests on benchmark data show that Irredundant Class outperforms most of the string algorithms previously proposed, and it achieves results as good as current state-of-the-art methods. Moreover the footprints of the most discriminative irredundant patterns can be used to guide the identification of functional regions in protein sequences.
format	Text
id	pubmed-3009487
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30094872010-12-23 Classification of protein sequences by means of irredundant patterns Comin, Matteo Verzotto, Davide BMC Bioinformatics Research BACKGROUND: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent, " and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence. RESULTS: In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines. CONCLUSION: Tests on benchmark data show that Irredundant Class outperforms most of the string algorithms previously proposed, and it achieves results as good as current state-of-the-art methods. Moreover the footprints of the most discriminative irredundant patterns can be used to guide the identification of functional regions in protein sequences. BioMed Central 2010-01-18 /pmc/articles/PMC3009487/ /pubmed/20122187 http://dx.doi.org/10.1186/1471-2105-11-S1-S16 Text en Copyright ©2010 Comin and Verzotto; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Comin, Matteo Verzotto, Davide Classification of protein sequences by means of irredundant patterns
title	Classification of protein sequences by means of irredundant patterns
title_full	Classification of protein sequences by means of irredundant patterns
title_fullStr	Classification of protein sequences by means of irredundant patterns
title_full_unstemmed	Classification of protein sequences by means of irredundant patterns
title_short	Classification of protein sequences by means of irredundant patterns
title_sort	classification of protein sequences by means of irredundant patterns
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009487/ https://www.ncbi.nlm.nih.gov/pubmed/20122187 http://dx.doi.org/10.1186/1471-2105-11-S1-S16
work_keys_str_mv	AT cominmatteo classificationofproteinsequencesbymeansofirredundantpatterns AT verzottodavide classificationofproteinsequencesbymeansofirredundantpatterns

Classification of protein sequences by means of irredundant patterns

Ejemplares similares