Cargando…
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
MOTIVATION: Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994749/ https://www.ncbi.nlm.nih.gov/pubmed/24646119 http://dx.doi.org/10.1186/1748-7188-9-8 |
_version_ | 1782312787296387072 |
---|---|
author | Simha, Ramanuja Shatkay, Hagit |
author_facet | Simha, Ramanuja Shatkay, Hagit |
author_sort | Simha, Ramanuja |
collection | PubMed |
description | MOTIVATION: Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. RESULTS: We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc(+)), without being restricted only to location-combinations present in the training set. |
format | Online Article Text |
id | pubmed-3994749 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39947492014-05-07 Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework Simha, Ramanuja Shatkay, Hagit Algorithms Mol Biol Research MOTIVATION: Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. RESULTS: We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc(+)), without being restricted only to location-combinations present in the training set. BioMed Central 2014-03-19 /pmc/articles/PMC3994749/ /pubmed/24646119 http://dx.doi.org/10.1186/1748-7188-9-8 Text en Copyright © 2014 Simha and Shatkay; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Simha, Ramanuja Shatkay, Hagit Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
title | Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
title_full | Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
title_fullStr | Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
title_full_unstemmed | Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
title_short | Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
title_sort | protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994749/ https://www.ncbi.nlm.nih.gov/pubmed/24646119 http://dx.doi.org/10.1186/1748-7188-9-8 |
work_keys_str_mv | AT simharamanuja proteinmultilocationpredictionusinglocationinterdependenciesinaprobabilisticframework AT shatkayhagit proteinmultilocationpredictionusinglocationinterdependenciesinaprobabilisticframework |