Cargando…

A method to improve protein subcellular localization prediction by integrating various biological data sources

BACKGROUND: Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for...

Descripción completa

Detalles Bibliográficos
Autores principales: Tung, Thai Quang, Lee, Doheon
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648781/
https://www.ncbi.nlm.nih.gov/pubmed/19208145
http://dx.doi.org/10.1186/1471-2105-10-S1-S43
_version_ 1782164986748993536
author Tung, Thai Quang
Lee, Doheon
author_facet Tung, Thai Quang
Lee, Doheon
author_sort Tung, Thai Quang
collection PubMed
description BACKGROUND: Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. RESULTS: In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. CONCLUSION: Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.
format Text
id pubmed-2648781
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26487812009-03-03 A method to improve protein subcellular localization prediction by integrating various biological data sources Tung, Thai Quang Lee, Doheon BMC Bioinformatics Research BACKGROUND: Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. RESULTS: In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. CONCLUSION: Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods. BioMed Central 2009-01-30 /pmc/articles/PMC2648781/ /pubmed/19208145 http://dx.doi.org/10.1186/1471-2105-10-S1-S43 Text en Copyright © 2009 Tung and Lee; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Tung, Thai Quang
Lee, Doheon
A method to improve protein subcellular localization prediction by integrating various biological data sources
title A method to improve protein subcellular localization prediction by integrating various biological data sources
title_full A method to improve protein subcellular localization prediction by integrating various biological data sources
title_fullStr A method to improve protein subcellular localization prediction by integrating various biological data sources
title_full_unstemmed A method to improve protein subcellular localization prediction by integrating various biological data sources
title_short A method to improve protein subcellular localization prediction by integrating various biological data sources
title_sort method to improve protein subcellular localization prediction by integrating various biological data sources
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648781/
https://www.ncbi.nlm.nih.gov/pubmed/19208145
http://dx.doi.org/10.1186/1471-2105-10-S1-S43
work_keys_str_mv AT tungthaiquang amethodtoimproveproteinsubcellularlocalizationpredictionbyintegratingvariousbiologicaldatasources
AT leedoheon amethodtoimproveproteinsubcellularlocalizationpredictionbyintegratingvariousbiologicaldatasources
AT tungthaiquang methodtoimproveproteinsubcellularlocalizationpredictionbyintegratingvariousbiologicaldatasources
AT leedoheon methodtoimproveproteinsubcellularlocalizationpredictionbyintegratingvariousbiologicaldatasources