Cargando…

HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins

Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on...

Descripción completa

Detalles Bibliográficos
Autores principales: Wan, Shibiao, Mak, Man-Wai, Kung, Sun-Yuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3960097/
https://www.ncbi.nlm.nih.gov/pubmed/24647341
http://dx.doi.org/10.1371/journal.pone.0089545
_version_ 1782308113329684480
author Wan, Shibiao
Mak, Man-Wai
Kung, Sun-Yuan
author_facet Wan, Shibiao
Mak, Man-Wai
Kung, Sun-Yuan
author_sort Wan, Shibiao
collection PubMed
description Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.
format Online
Article
Text
id pubmed-3960097
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39600972014-03-24 HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins Wan, Shibiao Mak, Man-Wai Kung, Sun-Yuan PLoS One Research Article Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/. Public Library of Science 2014-03-19 /pmc/articles/PMC3960097/ /pubmed/24647341 http://dx.doi.org/10.1371/journal.pone.0089545 Text en © 2014 Wan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wan, Shibiao
Mak, Man-Wai
Kung, Sun-Yuan
HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
title HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
title_full HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
title_fullStr HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
title_full_unstemmed HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
title_short HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
title_sort hybridgo-loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3960097/
https://www.ncbi.nlm.nih.gov/pubmed/24647341
http://dx.doi.org/10.1371/journal.pone.0089545
work_keys_str_mv AT wanshibiao hybridgolocmininghybridfeaturesongeneontologyforpredictingsubcellularlocalizationofmultilocationproteins
AT makmanwai hybridgolocmininghybridfeaturesongeneontologyforpredictingsubcellularlocalizationofmultilocationproteins
AT kungsunyuan hybridgolocmininghybridfeaturesongeneontologyforpredictingsubcellularlocalizationofmultilocationproteins