Cargando…

Computational identification of ubiquitylation sites from protein sequences

BACKGROUND: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-ba...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tung, Chun-Wei, Ho, Shinn-Ying
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2488362/ https://www.ncbi.nlm.nih.gov/pubmed/18625080 http://dx.doi.org/10.1186/1471-2105-9-310

_version_	1782158124387401728
author	Tung, Chun-Wei Ho, Shinn-Ying
author_facet	Tung, Chun-Wei Ho, Shinn-Ying
author_sort	Tung, Chun-Wei
collection	PubMed
description	BACKGROUND: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites. RESULTS: We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and NaïveBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation. Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules. CONCLUSION: We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at .
format	Text
id	pubmed-2488362
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24883622008-07-29 Computational identification of ubiquitylation sites from protein sequences Tung, Chun-Wei Ho, Shinn-Ying BMC Bioinformatics Research Article BACKGROUND: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites. RESULTS: We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and NaïveBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation. Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules. CONCLUSION: We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at . BioMed Central 2008-07-15 /pmc/articles/PMC2488362/ /pubmed/18625080 http://dx.doi.org/10.1186/1471-2105-9-310 Text en Copyright © 2008 Tung and Ho; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Tung, Chun-Wei Ho, Shinn-Ying Computational identification of ubiquitylation sites from protein sequences
title	Computational identification of ubiquitylation sites from protein sequences
title_full	Computational identification of ubiquitylation sites from protein sequences
title_fullStr	Computational identification of ubiquitylation sites from protein sequences
title_full_unstemmed	Computational identification of ubiquitylation sites from protein sequences
title_short	Computational identification of ubiquitylation sites from protein sequences
title_sort	computational identification of ubiquitylation sites from protein sequences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2488362/ https://www.ncbi.nlm.nih.gov/pubmed/18625080 http://dx.doi.org/10.1186/1471-2105-9-310
work_keys_str_mv	AT tungchunwei computationalidentificationofubiquitylationsitesfromproteinsequences AT hoshinnying computationalidentificationofubiquitylationsitesfromproteinsequences

Computational identification of ubiquitylation sites from protein sequences

Ejemplares similares