Cargando…

Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework

MOTIVATION: Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods a...

Descripción completa

Detalles Bibliográficos
Autores principales: Simha, Ramanuja, Shatkay, Hagit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994749/
https://www.ncbi.nlm.nih.gov/pubmed/24646119
http://dx.doi.org/10.1186/1748-7188-9-8
_version_ 1782312787296387072
author Simha, Ramanuja
Shatkay, Hagit
author_facet Simha, Ramanuja
Shatkay, Hagit
author_sort Simha, Ramanuja
collection PubMed
description MOTIVATION: Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. RESULTS: We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc(+)), without being restricted only to location-combinations present in the training set.
format Online
Article
Text
id pubmed-3994749
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39947492014-05-07 Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework Simha, Ramanuja Shatkay, Hagit Algorithms Mol Biol Research MOTIVATION: Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. RESULTS: We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc(+)), without being restricted only to location-combinations present in the training set. BioMed Central 2014-03-19 /pmc/articles/PMC3994749/ /pubmed/24646119 http://dx.doi.org/10.1186/1748-7188-9-8 Text en Copyright © 2014 Simha and Shatkay; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Simha, Ramanuja
Shatkay, Hagit
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
title Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
title_full Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
title_fullStr Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
title_full_unstemmed Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
title_short Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
title_sort protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994749/
https://www.ncbi.nlm.nih.gov/pubmed/24646119
http://dx.doi.org/10.1186/1748-7188-9-8
work_keys_str_mv AT simharamanuja proteinmultilocationpredictionusinglocationinterdependenciesinaprobabilisticframework
AT shatkayhagit proteinmultilocationpredictionusinglocationinterdependenciesinaprobabilisticframework