Cargando…

Protein localization prediction using random walks on graphs

BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification i...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Xiaohua, Lu, Lin, He, Ping, Chen, Ling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654884/
https://www.ncbi.nlm.nih.gov/pubmed/23815126
http://dx.doi.org/10.1186/1471-2105-14-S8-S4
Descripción
Sumario:BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph. RESULTS: We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria. CONCLUSIONS: This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers.