Cargando…

mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines

BACKGROUND: Although many computational methods have been developed to predict protein subcellular localization, most of the methods are limited to the prediction of single-location proteins. Multi-location proteins are either not considered or assumed not existing. However, proteins with multiple l...

Descripción completa

Detalles Bibliográficos
Autores principales: Wan, Shibiao, Mak, Man-Wai, Kung, Sun-Yuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582598/
https://www.ncbi.nlm.nih.gov/pubmed/23130999
http://dx.doi.org/10.1186/1471-2105-13-290
_version_ 1782260600313741312
author Wan, Shibiao
Mak, Man-Wai
Kung, Sun-Yuan
author_facet Wan, Shibiao
Mak, Man-Wai
Kung, Sun-Yuan
author_sort Wan, Shibiao
collection PubMed
description BACKGROUND: Although many computational methods have been developed to predict protein subcellular localization, most of the methods are limited to the prediction of single-location proteins. Multi-location proteins are either not considered or assumed not existing. However, proteins with multiple locations are particularly interesting because they may have special biological functions, which are essential to both basic research and drug discovery. RESULTS: This paper proposes an efficient multi-label predictor, namely mGOASVM, for predicting the subcellular localization of multi-location proteins. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the original accession number and the homologous accession numbers of the protein are used as keys to search against the Gene Ontology (GO) annotation database to obtain a set of GO terms. Given a set of training proteins, a set of T relevant GO terms is obtained by finding all of the GO terms in the GO annotation database that are relevant to the training proteins. These relevant GO terms then form the basis of a T-dimensional Euclidean space on which the GO vectors lie. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO vectors. The mGOASVM predictor has the following advantages: (1) it uses the frequency of occurrences of GO terms for feature representation; (2) it selects the relevant GO subspace which can substantially speed up the prediction without compromising performance; and (3) it adopts an efficient multi-label SVM classifier which significantly outperforms other predictors. Briefly, on two recently published virus and plant datasets, mGOASVM achieves an actual accuracy of 88.9% and 87.4%, respectively, which are significantly higher than those achieved by the state-of-the-art predictors such as iLoc-Virus (74.8%) and iLoc-Plant (68.1%). CONCLUSIONS: mGOASVM can efficiently predict the subcellular locations of multi-label proteins. The mGOASVM predictor is available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/mGOASVM.html.
format Online
Article
Text
id pubmed-3582598
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35825982013-03-08 mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines Wan, Shibiao Mak, Man-Wai Kung, Sun-Yuan BMC Bioinformatics Methodology Article BACKGROUND: Although many computational methods have been developed to predict protein subcellular localization, most of the methods are limited to the prediction of single-location proteins. Multi-location proteins are either not considered or assumed not existing. However, proteins with multiple locations are particularly interesting because they may have special biological functions, which are essential to both basic research and drug discovery. RESULTS: This paper proposes an efficient multi-label predictor, namely mGOASVM, for predicting the subcellular localization of multi-location proteins. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the original accession number and the homologous accession numbers of the protein are used as keys to search against the Gene Ontology (GO) annotation database to obtain a set of GO terms. Given a set of training proteins, a set of T relevant GO terms is obtained by finding all of the GO terms in the GO annotation database that are relevant to the training proteins. These relevant GO terms then form the basis of a T-dimensional Euclidean space on which the GO vectors lie. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO vectors. The mGOASVM predictor has the following advantages: (1) it uses the frequency of occurrences of GO terms for feature representation; (2) it selects the relevant GO subspace which can substantially speed up the prediction without compromising performance; and (3) it adopts an efficient multi-label SVM classifier which significantly outperforms other predictors. Briefly, on two recently published virus and plant datasets, mGOASVM achieves an actual accuracy of 88.9% and 87.4%, respectively, which are significantly higher than those achieved by the state-of-the-art predictors such as iLoc-Virus (74.8%) and iLoc-Plant (68.1%). CONCLUSIONS: mGOASVM can efficiently predict the subcellular locations of multi-label proteins. The mGOASVM predictor is available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/mGOASVM.html. BioMed Central 2012-11-06 /pmc/articles/PMC3582598/ /pubmed/23130999 http://dx.doi.org/10.1186/1471-2105-13-290 Text en Copyright ©2012 Wan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wan, Shibiao
Mak, Man-Wai
Kung, Sun-Yuan
mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
title mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
title_full mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
title_fullStr mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
title_full_unstemmed mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
title_short mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
title_sort mgoasvm: multi-label protein subcellular localization based on gene ontology and support vector machines
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582598/
https://www.ncbi.nlm.nih.gov/pubmed/23130999
http://dx.doi.org/10.1186/1471-2105-13-290
work_keys_str_mv AT wanshibiao mgoasvmmultilabelproteinsubcellularlocalizationbasedongeneontologyandsupportvectormachines
AT makmanwai mgoasvmmultilabelproteinsubcellularlocalizationbasedongeneontologyandsupportvectormachines
AT kungsunyuan mgoasvmmultilabelproteinsubcellularlocalizationbasedongeneontologyandsupportvectormachines