Cargando…
Prediction of nuclear proteins using SVM and HMM models
BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nucl...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2632991/ https://www.ncbi.nlm.nih.gov/pubmed/19152693 http://dx.doi.org/10.1186/1471-2105-10-22 |
_version_ | 1782164063667617792 |
---|---|
author | Kumar, Manish Raghava, Gajendra PS |
author_facet | Kumar, Manish Raghava, Gajendra PS |
author_sort | Kumar, Manish |
collection | PubMed |
description | BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy. RESULTS: All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins . CONCLUSION: This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together. |
format | Text |
id | pubmed-2632991 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26329912009-01-30 Prediction of nuclear proteins using SVM and HMM models Kumar, Manish Raghava, Gajendra PS BMC Bioinformatics Research Article BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy. RESULTS: All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins . CONCLUSION: This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together. BioMed Central 2009-01-19 /pmc/articles/PMC2632991/ /pubmed/19152693 http://dx.doi.org/10.1186/1471-2105-10-22 Text en Copyright © 2009 Kumar and Raghava; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Kumar, Manish Raghava, Gajendra PS Prediction of nuclear proteins using SVM and HMM models |
title | Prediction of nuclear proteins using SVM and HMM models |
title_full | Prediction of nuclear proteins using SVM and HMM models |
title_fullStr | Prediction of nuclear proteins using SVM and HMM models |
title_full_unstemmed | Prediction of nuclear proteins using SVM and HMM models |
title_short | Prediction of nuclear proteins using SVM and HMM models |
title_sort | prediction of nuclear proteins using svm and hmm models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2632991/ https://www.ncbi.nlm.nih.gov/pubmed/19152693 http://dx.doi.org/10.1186/1471-2105-10-22 |
work_keys_str_mv | AT kumarmanish predictionofnuclearproteinsusingsvmandhmmmodels AT raghavagajendraps predictionofnuclearproteinsusingsvmandhmmmodels |