Cargando…

An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures

BACKGROUND: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extr...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Guo Sheng, Yu, Zu Guo, Anh, Vo, Krishnajith, Anaththa P. D., Tian, Yu-Chu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584121/
https://www.ncbi.nlm.nih.gov/pubmed/23460833
http://dx.doi.org/10.1371/journal.pone.0057225
_version_ 1782260987895742464
author Han, Guo Sheng
Yu, Zu Guo
Anh, Vo
Krishnajith, Anaththa P. D.
Tian, Yu-Chu
author_facet Han, Guo Sheng
Yu, Zu Guo
Anh, Vo
Krishnajith, Anaththa P. D.
Tian, Yu-Chu
author_sort Han, Guo Sheng
collection PubMed
description BACKGROUND: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. METHODOLOGY/PRINCIPAL FINDINGS: A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. CONCLUSIONS: It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.
format Online
Article
Text
id pubmed-3584121
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35841212013-03-04 An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures Han, Guo Sheng Yu, Zu Guo Anh, Vo Krishnajith, Anaththa P. D. Tian, Yu-Chu PLoS One Research Article BACKGROUND: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. METHODOLOGY/PRINCIPAL FINDINGS: A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. CONCLUSIONS: It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php. Public Library of Science 2013-02-27 /pmc/articles/PMC3584121/ /pubmed/23460833 http://dx.doi.org/10.1371/journal.pone.0057225 Text en © 2013 Han et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Han, Guo Sheng
Yu, Zu Guo
Anh, Vo
Krishnajith, Anaththa P. D.
Tian, Yu-Chu
An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
title An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
title_full An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
title_fullStr An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
title_full_unstemmed An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
title_short An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
title_sort ensemble method for predicting subnuclear localizations from primary protein structures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584121/
https://www.ncbi.nlm.nih.gov/pubmed/23460833
http://dx.doi.org/10.1371/journal.pone.0057225
work_keys_str_mv AT hanguosheng anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT yuzuguo anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT anhvo anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT krishnajithanaththapd anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT tianyuchu anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT hanguosheng ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT yuzuguo ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT anhvo ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT krishnajithanaththapd ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT tianyuchu ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures