Cargando…

Protein subcellular localization prediction of eukaryotes using a knowledge-based approach

BACKGROUND: The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Hsin-Nan, Chen, Ching-Tai, Sung, Ting-Yi, Ho, Shinn-Ying, Hsu, Wen-Lian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788359/
https://www.ncbi.nlm.nih.gov/pubmed/19958518
http://dx.doi.org/10.1186/1471-2105-10-S15-S8
_version_ 1782174963404374016
author Lin, Hsin-Nan
Chen, Ching-Tai
Sung, Ting-Yi
Ho, Shinn-Ying
Hsu, Wen-Lian
author_facet Lin, Hsin-Nan
Chen, Ching-Tai
Sung, Ting-Yi
Ho, Shinn-Ying
Hsu, Wen-Lian
author_sort Lin, Hsin-Nan
collection PubMed
description BACKGROUND: The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles. RESULTS: In this study, we propose a knowledge based method, called KnowPred(site), to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred(site)'s performance. The experiment results show that KnowPred(site )achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred(site )is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred(site )is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred(site). CONCLUSION: KnowPred(site )demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred(site )is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred(site )is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred(site )prediction server is available at http://bio-cluster.iis.sinica.edu.tw/kbloc/.
format Text
id pubmed-2788359
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27883592009-12-04 Protein subcellular localization prediction of eukaryotes using a knowledge-based approach Lin, Hsin-Nan Chen, Ching-Tai Sung, Ting-Yi Ho, Shinn-Ying Hsu, Wen-Lian BMC Bioinformatics Proceedings BACKGROUND: The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles. RESULTS: In this study, we propose a knowledge based method, called KnowPred(site), to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred(site)'s performance. The experiment results show that KnowPred(site )achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred(site )is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred(site )is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred(site). CONCLUSION: KnowPred(site )demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred(site )is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred(site )is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred(site )prediction server is available at http://bio-cluster.iis.sinica.edu.tw/kbloc/. BioMed Central 2009-12-03 /pmc/articles/PMC2788359/ /pubmed/19958518 http://dx.doi.org/10.1186/1471-2105-10-S15-S8 Text en Copyright ©2009 Lin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lin, Hsin-Nan
Chen, Ching-Tai
Sung, Ting-Yi
Ho, Shinn-Ying
Hsu, Wen-Lian
Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
title Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
title_full Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
title_fullStr Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
title_full_unstemmed Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
title_short Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
title_sort protein subcellular localization prediction of eukaryotes using a knowledge-based approach
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788359/
https://www.ncbi.nlm.nih.gov/pubmed/19958518
http://dx.doi.org/10.1186/1471-2105-10-S15-S8
work_keys_str_mv AT linhsinnan proteinsubcellularlocalizationpredictionofeukaryotesusingaknowledgebasedapproach
AT chenchingtai proteinsubcellularlocalizationpredictionofeukaryotesusingaknowledgebasedapproach
AT sungtingyi proteinsubcellularlocalizationpredictionofeukaryotesusingaknowledgebasedapproach
AT hoshinnying proteinsubcellularlocalizationpredictionofeukaryotesusingaknowledgebasedapproach
AT hsuwenlian proteinsubcellularlocalizationpredictionofeukaryotesusingaknowledgebasedapproach