Cargando…

In-silico prediction of disorder content using hybrid sequence representation

BACKGROUND: Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abund...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mizianty, Marcin J, Zhang, Tuo, Xue, Bin, Zhou, Yaoqi, Dunker, A Keith, Uversky, Vladimir N, Kurgan, Lukasz
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3212983/ https://www.ncbi.nlm.nih.gov/pubmed/21682902 http://dx.doi.org/10.1186/1471-2105-12-245

_version_	1782216057497321472
author	Mizianty, Marcin J Zhang, Tuo Xue, Bin Zhou, Yaoqi Dunker, A Keith Uversky, Vladimir N Kurgan, Lukasz
author_facet	Mizianty, Marcin J Zhang, Tuo Xue, Bin Zhou, Yaoqi Dunker, A Keith Uversky, Vladimir N Kurgan, Lukasz
author_sort	Mizianty, Marcin J
collection	PubMed
description	BACKGROUND: Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content. RESULTS: We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content. CONCLUSIONS: DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/.
format	Online Article Text
id	pubmed-3212983
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32129832011-11-11 In-silico prediction of disorder content using hybrid sequence representation Mizianty, Marcin J Zhang, Tuo Xue, Bin Zhou, Yaoqi Dunker, A Keith Uversky, Vladimir N Kurgan, Lukasz BMC Bioinformatics Methodology Article BACKGROUND: Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content. RESULTS: We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content. CONCLUSIONS: DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/. BioMed Central 2011-06-17 /pmc/articles/PMC3212983/ /pubmed/21682902 http://dx.doi.org/10.1186/1471-2105-12-245 Text en Copyright ©2011 Mizianty et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Mizianty, Marcin J Zhang, Tuo Xue, Bin Zhou, Yaoqi Dunker, A Keith Uversky, Vladimir N Kurgan, Lukasz In-silico prediction of disorder content using hybrid sequence representation
title	In-silico prediction of disorder content using hybrid sequence representation
title_full	In-silico prediction of disorder content using hybrid sequence representation
title_fullStr	In-silico prediction of disorder content using hybrid sequence representation
title_full_unstemmed	In-silico prediction of disorder content using hybrid sequence representation
title_short	In-silico prediction of disorder content using hybrid sequence representation
title_sort	in-silico prediction of disorder content using hybrid sequence representation
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3212983/ https://www.ncbi.nlm.nih.gov/pubmed/21682902 http://dx.doi.org/10.1186/1471-2105-12-245
work_keys_str_mv	AT miziantymarcinj insilicopredictionofdisordercontentusinghybridsequencerepresentation AT zhangtuo insilicopredictionofdisordercontentusinghybridsequencerepresentation AT xuebin insilicopredictionofdisordercontentusinghybridsequencerepresentation AT zhouyaoqi insilicopredictionofdisordercontentusinghybridsequencerepresentation AT dunkerakeith insilicopredictionofdisordercontentusinghybridsequencerepresentation AT uverskyvladimirn insilicopredictionofdisordercontentusinghybridsequencerepresentation AT kurganlukasz insilicopredictionofdisordercontentusinghybridsequencerepresentation

In-silico prediction of disorder content using hybrid sequence representation

Ejemplares similares