Cargando…

An analysis of single amino acid repeats as use case for application specific background models

BACKGROUND: Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Łabaj, Paweł P, Sykacek, Peter, Kreil, David P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124433/ https://www.ncbi.nlm.nih.gov/pubmed/21595908 http://dx.doi.org/10.1186/1471-2105-12-173

_version_	1782207087790522368
author	Łabaj, Paweł P Sykacek, Peter Kreil, David P
author_facet	Łabaj, Paweł P Sykacek, Peter Kreil, David P
author_sort	Łabaj, Paweł P
collection	PubMed
description	BACKGROUND: Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. RESULTS: Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. CONCLUSIONS: Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation.
format	Online Article Text
id	pubmed-3124433
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31244332011-06-28 An analysis of single amino acid repeats as use case for application specific background models Łabaj, Paweł P Sykacek, Peter Kreil, David P BMC Bioinformatics Research Article BACKGROUND: Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. RESULTS: Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. CONCLUSIONS: Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation. BioMed Central 2011-05-19 /pmc/articles/PMC3124433/ /pubmed/21595908 http://dx.doi.org/10.1186/1471-2105-12-173 Text en Copyright ©2011 Łabaj et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Łabaj, Paweł P Sykacek, Peter Kreil, David P An analysis of single amino acid repeats as use case for application specific background models
title	An analysis of single amino acid repeats as use case for application specific background models
title_full	An analysis of single amino acid repeats as use case for application specific background models
title_fullStr	An analysis of single amino acid repeats as use case for application specific background models
title_full_unstemmed	An analysis of single amino acid repeats as use case for application specific background models
title_short	An analysis of single amino acid repeats as use case for application specific background models
title_sort	analysis of single amino acid repeats as use case for application specific background models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124433/ https://www.ncbi.nlm.nih.gov/pubmed/21595908 http://dx.doi.org/10.1186/1471-2105-12-173
work_keys_str_mv	AT łabajpawełp ananalysisofsingleaminoacidrepeatsasusecaseforapplicationspecificbackgroundmodels AT sykacekpeter ananalysisofsingleaminoacidrepeatsasusecaseforapplicationspecificbackgroundmodels AT kreildavidp ananalysisofsingleaminoacidrepeatsasusecaseforapplicationspecificbackgroundmodels AT łabajpawełp analysisofsingleaminoacidrepeatsasusecaseforapplicationspecificbackgroundmodels AT sykacekpeter analysisofsingleaminoacidrepeatsasusecaseforapplicationspecificbackgroundmodels AT kreildavidp analysisofsingleaminoacidrepeatsasusecaseforapplicationspecificbackgroundmodels

An analysis of single amino acid repeats as use case for application specific background models

Ejemplares similares