Cargando…

Protein localization prediction using random walks on graphs

BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification i...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Xiaohua, Lu, Lin, He, Ping, Chen, Ling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654884/
https://www.ncbi.nlm.nih.gov/pubmed/23815126
http://dx.doi.org/10.1186/1471-2105-14-S8-S4
_version_ 1782269785079283712
author Xu, Xiaohua
Lu, Lin
He, Ping
Chen, Ling
author_facet Xu, Xiaohua
Lu, Lin
He, Ping
Chen, Ling
author_sort Xu, Xiaohua
collection PubMed
description BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph. RESULTS: We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria. CONCLUSIONS: This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers.
format Online
Article
Text
id pubmed-3654884
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36548842013-05-20 Protein localization prediction using random walks on graphs Xu, Xiaohua Lu, Lin He, Ping Chen, Ling BMC Bioinformatics Proceedings BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph. RESULTS: We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria. CONCLUSIONS: This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers. BioMed Central 2013-05-09 /pmc/articles/PMC3654884/ /pubmed/23815126 http://dx.doi.org/10.1186/1471-2105-14-S8-S4 Text en Copyright © 2013 Xu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Xu, Xiaohua
Lu, Lin
He, Ping
Chen, Ling
Protein localization prediction using random walks on graphs
title Protein localization prediction using random walks on graphs
title_full Protein localization prediction using random walks on graphs
title_fullStr Protein localization prediction using random walks on graphs
title_full_unstemmed Protein localization prediction using random walks on graphs
title_short Protein localization prediction using random walks on graphs
title_sort protein localization prediction using random walks on graphs
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654884/
https://www.ncbi.nlm.nih.gov/pubmed/23815126
http://dx.doi.org/10.1186/1471-2105-14-S8-S4
work_keys_str_mv AT xuxiaohua proteinlocalizationpredictionusingrandomwalksongraphs
AT lulin proteinlocalizationpredictionusingrandomwalksongraphs
AT heping proteinlocalizationpredictionusingrandomwalksongraphs
AT chenling proteinlocalizationpredictionusingrandomwalksongraphs