Cargando…
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing
BACKGROUND: Identification of subcellular localization in proteins is crucial to elucidate cellular processes and molecular functions in a cell. However, given a tremendous amount of sequence data generated in the post-genomic era, determining protein localization based on biological experiments can...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521467/ https://www.ncbi.nlm.nih.gov/pubmed/23282098 http://dx.doi.org/10.1186/1471-2105-13-S17-S13 |
_version_ | 1782252960763346944 |
---|---|
author | Su, Emily Chia-Yu Chang, Jia-Ming Cheng, Cheng-Wei Sung, Ting-Yi Hsu, Wen-Lian |
author_facet | Su, Emily Chia-Yu Chang, Jia-Ming Cheng, Cheng-Wei Sung, Ting-Yi Hsu, Wen-Lian |
author_sort | Su, Emily Chia-Yu |
collection | PubMed |
description | BACKGROUND: Identification of subcellular localization in proteins is crucial to elucidate cellular processes and molecular functions in a cell. However, given a tremendous amount of sequence data generated in the post-genomic era, determining protein localization based on biological experiments can be expensive and time-consuming. Therefore, developing prediction systems to analyze uncharacterised proteins efficiently has played an important role in high-throughput protein analyses. In a eukaryotic cell, many essential biological processes take place in the nucleus. Nuclear proteins shuttle between nucleus and cytoplasm based on recognition of nuclear translocation signals, including nuclear localization signals (NLSs) and nuclear export signals (NESs). Currently, only a few approaches have been developed specifically to predict nuclear localization using sequence features, such as putative NLSs. However, it has been shown that prediction coverage based on the NLSs is very low. In addition, most existing approaches only attained prediction accuracy and Matthew's correlation coefficient (MCC) around 54%~70% and 0.250~0.380 on independent test set, respectively. Moreover, no predictor can generate sequence motifs to characterize features of potential NESs, in which biological properties are not well understood from existing experimental studies. RESULTS: In this study, first we propose PSLNuc (Protein Subcellular Localization prediction for Nucleus) for predicting nuclear localization in proteins. First, for feature representation, a protein is represented by gapped-dipeptides and the feature values are weighted by homology information from a smoothed position-specific scoring matrix. After that, we incorporate probabilistic latent semantic indexing (PLSI) for feature reduction. Finally, the reduced features are used as input for a support vector machine (SVM) classifier. In addition to PSLNuc, we further identify gapped-dipeptide signatures for putative NLSs and NESs to develop a prediction method, PSLNTS (Protein Subcellular Localization prediction using Nuclear Translocation Signals). We apply PLSI to generate gapped-dipeptide signatures from both nuclear and non-nuclear proteins, and propose candidate sequence motifs for putative NLSs and NESs. Then, we incorporate only the proposed gapped-dipeptide signatures in an SVM classifier to mimic biological properties of NLSs and NESs for predicting nuclear localization in PSLNTS. CONCLUSIONS: Experiment results demonstrate that the proposed method shows a significant improvement for nuclear localization prediction. To compare our predictive performance with other approaches, we incorporate two non-redundant benchmark data sets, a training set and an independent test set. Evaluated by five-fold cross-validation on the training set, PSLNuc attains an overall accuracy of 79.7%, which is 4.8% improvement over the state-of-the-art system. In addition, our method also enhances the MCC from 0.497 to 0.595. Compared on the independent test set, PSLNuc outperforms other predictors by 3.9%~19.9% on accuracy and 0.077~0.207 on MCC. This suggests that, in addition to NLSs, which have been shown important for nuclear proteins, NESs can also be an effective indicator to detect non-nuclear proteins. Most notably, using only a few proposed gapped-dipeptide signatures as input features for the SVM classifier, PSLNTS further enhances the accuracy and MCC to 80.9% and 0.618, respectively. Our results demonstrate that gapped-dipeptide signatures can better discriminate nuclear and non-nuclear proteins. Moreover, the proposed gapped-dipeptide signatures can be biologically interpreted and used in further experiment analyses of nuclear translocation signals, including NLSs and NESs. |
format | Online Article Text |
id | pubmed-3521467 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35214672012-12-14 Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing Su, Emily Chia-Yu Chang, Jia-Ming Cheng, Cheng-Wei Sung, Ting-Yi Hsu, Wen-Lian BMC Bioinformatics Proceedings BACKGROUND: Identification of subcellular localization in proteins is crucial to elucidate cellular processes and molecular functions in a cell. However, given a tremendous amount of sequence data generated in the post-genomic era, determining protein localization based on biological experiments can be expensive and time-consuming. Therefore, developing prediction systems to analyze uncharacterised proteins efficiently has played an important role in high-throughput protein analyses. In a eukaryotic cell, many essential biological processes take place in the nucleus. Nuclear proteins shuttle between nucleus and cytoplasm based on recognition of nuclear translocation signals, including nuclear localization signals (NLSs) and nuclear export signals (NESs). Currently, only a few approaches have been developed specifically to predict nuclear localization using sequence features, such as putative NLSs. However, it has been shown that prediction coverage based on the NLSs is very low. In addition, most existing approaches only attained prediction accuracy and Matthew's correlation coefficient (MCC) around 54%~70% and 0.250~0.380 on independent test set, respectively. Moreover, no predictor can generate sequence motifs to characterize features of potential NESs, in which biological properties are not well understood from existing experimental studies. RESULTS: In this study, first we propose PSLNuc (Protein Subcellular Localization prediction for Nucleus) for predicting nuclear localization in proteins. First, for feature representation, a protein is represented by gapped-dipeptides and the feature values are weighted by homology information from a smoothed position-specific scoring matrix. After that, we incorporate probabilistic latent semantic indexing (PLSI) for feature reduction. Finally, the reduced features are used as input for a support vector machine (SVM) classifier. In addition to PSLNuc, we further identify gapped-dipeptide signatures for putative NLSs and NESs to develop a prediction method, PSLNTS (Protein Subcellular Localization prediction using Nuclear Translocation Signals). We apply PLSI to generate gapped-dipeptide signatures from both nuclear and non-nuclear proteins, and propose candidate sequence motifs for putative NLSs and NESs. Then, we incorporate only the proposed gapped-dipeptide signatures in an SVM classifier to mimic biological properties of NLSs and NESs for predicting nuclear localization in PSLNTS. CONCLUSIONS: Experiment results demonstrate that the proposed method shows a significant improvement for nuclear localization prediction. To compare our predictive performance with other approaches, we incorporate two non-redundant benchmark data sets, a training set and an independent test set. Evaluated by five-fold cross-validation on the training set, PSLNuc attains an overall accuracy of 79.7%, which is 4.8% improvement over the state-of-the-art system. In addition, our method also enhances the MCC from 0.497 to 0.595. Compared on the independent test set, PSLNuc outperforms other predictors by 3.9%~19.9% on accuracy and 0.077~0.207 on MCC. This suggests that, in addition to NLSs, which have been shown important for nuclear proteins, NESs can also be an effective indicator to detect non-nuclear proteins. Most notably, using only a few proposed gapped-dipeptide signatures as input features for the SVM classifier, PSLNTS further enhances the accuracy and MCC to 80.9% and 0.618, respectively. Our results demonstrate that gapped-dipeptide signatures can better discriminate nuclear and non-nuclear proteins. Moreover, the proposed gapped-dipeptide signatures can be biologically interpreted and used in further experiment analyses of nuclear translocation signals, including NLSs and NESs. BioMed Central 2012-12-07 /pmc/articles/PMC3521467/ /pubmed/23282098 http://dx.doi.org/10.1186/1471-2105-13-S17-S13 Text en Copyright ©2012 Su et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Su, Emily Chia-Yu Chang, Jia-Ming Cheng, Cheng-Wei Sung, Ting-Yi Hsu, Wen-Lian Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
title | Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
title_full | Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
title_fullStr | Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
title_full_unstemmed | Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
title_short | Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
title_sort | prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521467/ https://www.ncbi.nlm.nih.gov/pubmed/23282098 http://dx.doi.org/10.1186/1471-2105-13-S17-S13 |
work_keys_str_mv | AT suemilychiayu predictionofnuclearproteinsusingnucleartranslocationsignalsproposedbyprobabilisticlatentsemanticindexing AT changjiaming predictionofnuclearproteinsusingnucleartranslocationsignalsproposedbyprobabilisticlatentsemanticindexing AT chengchengwei predictionofnuclearproteinsusingnucleartranslocationsignalsproposedbyprobabilisticlatentsemanticindexing AT sungtingyi predictionofnuclearproteinsusingnucleartranslocationsignalsproposedbyprobabilisticlatentsemanticindexing AT hsuwenlian predictionofnuclearproteinsusingnucleartranslocationsignalsproposedbyprobabilisticlatentsemanticindexing |