Cargando…

Prediction of nuclear proteins using SVM and HMM models

BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nucl...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Manish, Raghava, Gajendra PS
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2632991/
https://www.ncbi.nlm.nih.gov/pubmed/19152693
http://dx.doi.org/10.1186/1471-2105-10-22
_version_ 1782164063667617792
author Kumar, Manish
Raghava, Gajendra PS
author_facet Kumar, Manish
Raghava, Gajendra PS
author_sort Kumar, Manish
collection PubMed
description BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy. RESULTS: All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins . CONCLUSION: This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.
format Text
id pubmed-2632991
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26329912009-01-30 Prediction of nuclear proteins using SVM and HMM models Kumar, Manish Raghava, Gajendra PS BMC Bioinformatics Research Article BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy. RESULTS: All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins . CONCLUSION: This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together. BioMed Central 2009-01-19 /pmc/articles/PMC2632991/ /pubmed/19152693 http://dx.doi.org/10.1186/1471-2105-10-22 Text en Copyright © 2009 Kumar and Raghava; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kumar, Manish
Raghava, Gajendra PS
Prediction of nuclear proteins using SVM and HMM models
title Prediction of nuclear proteins using SVM and HMM models
title_full Prediction of nuclear proteins using SVM and HMM models
title_fullStr Prediction of nuclear proteins using SVM and HMM models
title_full_unstemmed Prediction of nuclear proteins using SVM and HMM models
title_short Prediction of nuclear proteins using SVM and HMM models
title_sort prediction of nuclear proteins using svm and hmm models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2632991/
https://www.ncbi.nlm.nih.gov/pubmed/19152693
http://dx.doi.org/10.1186/1471-2105-10-22
work_keys_str_mv AT kumarmanish predictionofnuclearproteinsusingsvmandhmmmodels
AT raghavagajendraps predictionofnuclearproteinsusingsvmandhmmmodels