Cargando…
An SVM-based system for predicting protein subnuclear localizations
BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The in...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325059/ https://www.ncbi.nlm.nih.gov/pubmed/16336650 http://dx.doi.org/10.1186/1471-2105-6-291 |
_version_ | 1782126468969529344 |
---|---|
author | Lei, Zhengdeng Dai, Yang |
author_facet | Lei, Zhengdeng Dai, Yang |
author_sort | Lei, Zhengdeng |
collection | PubMed |
description | BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key. RESULTS: New kernel functions used in a support vector machine (SVM) learning model are introduced for the measurement of sequence similarity. The k-peptide vectors are first mapped by a matrix of high-scored pairs of k-peptides which are measured by BLOSUM62 scores. The kernels, measuring the similarity for sequences, are then defined on the mapped vectors. By combining these new encoding methods, a multi-class classification system for the prediction of protein subnuclear localizations is established for the first time. The performance of the system is evaluated with a set of proteins collected in the Nuclear Protein Database (NPD). The overall accuracy of prediction for 6 localizations is about 50% (vs. random prediction 16.7%) for single localization proteins in the leave-one-out cross-validation; and 65% for an independent set of multi-localization proteins. This integrated system can be accessed at . CONCLUSION: The integrated system benefits from the combination of predictions from several SVMs based on selected encoding methods. Finally, the predictive power of the system is expected to improve as more proteins with known subnuclear localizations become available. |
format | Text |
id | pubmed-1325059 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-13250592006-01-24 An SVM-based system for predicting protein subnuclear localizations Lei, Zhengdeng Dai, Yang BMC Bioinformatics Methodology Article BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key. RESULTS: New kernel functions used in a support vector machine (SVM) learning model are introduced for the measurement of sequence similarity. The k-peptide vectors are first mapped by a matrix of high-scored pairs of k-peptides which are measured by BLOSUM62 scores. The kernels, measuring the similarity for sequences, are then defined on the mapped vectors. By combining these new encoding methods, a multi-class classification system for the prediction of protein subnuclear localizations is established for the first time. The performance of the system is evaluated with a set of proteins collected in the Nuclear Protein Database (NPD). The overall accuracy of prediction for 6 localizations is about 50% (vs. random prediction 16.7%) for single localization proteins in the leave-one-out cross-validation; and 65% for an independent set of multi-localization proteins. This integrated system can be accessed at . CONCLUSION: The integrated system benefits from the combination of predictions from several SVMs based on selected encoding methods. Finally, the predictive power of the system is expected to improve as more proteins with known subnuclear localizations become available. BioMed Central 2005-12-07 /pmc/articles/PMC1325059/ /pubmed/16336650 http://dx.doi.org/10.1186/1471-2105-6-291 Text en Copyright © 2005 Lei and Dai; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Lei, Zhengdeng Dai, Yang An SVM-based system for predicting protein subnuclear localizations |
title | An SVM-based system for predicting protein subnuclear localizations |
title_full | An SVM-based system for predicting protein subnuclear localizations |
title_fullStr | An SVM-based system for predicting protein subnuclear localizations |
title_full_unstemmed | An SVM-based system for predicting protein subnuclear localizations |
title_short | An SVM-based system for predicting protein subnuclear localizations |
title_sort | svm-based system for predicting protein subnuclear localizations |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1325059/ https://www.ncbi.nlm.nih.gov/pubmed/16336650 http://dx.doi.org/10.1186/1471-2105-6-291 |
work_keys_str_mv | AT leizhengdeng ansvmbasedsystemforpredictingproteinsubnuclearlocalizations AT daiyang ansvmbasedsystemforpredictingproteinsubnuclearlocalizations AT leizhengdeng svmbasedsystemforpredictingproteinsubnuclearlocalizations AT daiyang svmbasedsystemforpredictingproteinsubnuclearlocalizations |