Cargando…
Protein localization prediction using random walks on graphs
BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification i...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654884/ https://www.ncbi.nlm.nih.gov/pubmed/23815126 http://dx.doi.org/10.1186/1471-2105-14-S8-S4 |
_version_ | 1782269785079283712 |
---|---|
author | Xu, Xiaohua Lu, Lin He, Ping Chen, Ling |
author_facet | Xu, Xiaohua Lu, Lin He, Ping Chen, Ling |
author_sort | Xu, Xiaohua |
collection | PubMed |
description | BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph. RESULTS: We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria. CONCLUSIONS: This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers. |
format | Online Article Text |
id | pubmed-3654884 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36548842013-05-20 Protein localization prediction using random walks on graphs Xu, Xiaohua Lu, Lin He, Ping Chen, Ling BMC Bioinformatics Proceedings BACKGROUND: Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph. RESULTS: We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria. CONCLUSIONS: This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers. BioMed Central 2013-05-09 /pmc/articles/PMC3654884/ /pubmed/23815126 http://dx.doi.org/10.1186/1471-2105-14-S8-S4 Text en Copyright © 2013 Xu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Xu, Xiaohua Lu, Lin He, Ping Chen, Ling Protein localization prediction using random walks on graphs |
title | Protein localization prediction using random walks on graphs |
title_full | Protein localization prediction using random walks on graphs |
title_fullStr | Protein localization prediction using random walks on graphs |
title_full_unstemmed | Protein localization prediction using random walks on graphs |
title_short | Protein localization prediction using random walks on graphs |
title_sort | protein localization prediction using random walks on graphs |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654884/ https://www.ncbi.nlm.nih.gov/pubmed/23815126 http://dx.doi.org/10.1186/1471-2105-14-S8-S4 |
work_keys_str_mv | AT xuxiaohua proteinlocalizationpredictionusingrandomwalksongraphs AT lulin proteinlocalizationpredictionusingrandomwalksongraphs AT heping proteinlocalizationpredictionusingrandomwalksongraphs AT chenling proteinlocalizationpredictionusingrandomwalksongraphs |