Cargando…

Joint probabilistic-logical refinement of multiple protein feature predictors

BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into des...

Descripción completa

Detalles Bibliográficos
Autores principales: Teso, Stefano, Passerini, Andrea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3929554/
https://www.ncbi.nlm.nih.gov/pubmed/24428894
http://dx.doi.org/10.1186/1471-2105-15-16
_version_ 1782304406469869568
author Teso, Stefano
Passerini, Andrea
author_facet Teso, Stefano
Passerini, Andrea
author_sort Teso, Stefano
collection PubMed
description BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result. RESULTS: By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85–100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177–W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094–2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods. CONCLUSIONS: The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321–W326, 2004].
format Online
Article
Text
id pubmed-3929554
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39295542014-03-05 Joint probabilistic-logical refinement of multiple protein feature predictors Teso, Stefano Passerini, Andrea BMC Bioinformatics Methodology Article BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result. RESULTS: By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85–100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177–W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094–2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods. CONCLUSIONS: The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321–W326, 2004]. BioMed Central 2014-01-15 /pmc/articles/PMC3929554/ /pubmed/24428894 http://dx.doi.org/10.1186/1471-2105-15-16 Text en Copyright © 2014 Teso and Passerini; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Teso, Stefano
Passerini, Andrea
Joint probabilistic-logical refinement of multiple protein feature predictors
title Joint probabilistic-logical refinement of multiple protein feature predictors
title_full Joint probabilistic-logical refinement of multiple protein feature predictors
title_fullStr Joint probabilistic-logical refinement of multiple protein feature predictors
title_full_unstemmed Joint probabilistic-logical refinement of multiple protein feature predictors
title_short Joint probabilistic-logical refinement of multiple protein feature predictors
title_sort joint probabilistic-logical refinement of multiple protein feature predictors
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3929554/
https://www.ncbi.nlm.nih.gov/pubmed/24428894
http://dx.doi.org/10.1186/1471-2105-15-16
work_keys_str_mv AT tesostefano jointprobabilisticlogicalrefinementofmultipleproteinfeaturepredictors
AT passeriniandrea jointprobabilisticlogicalrefinementofmultipleproteinfeaturepredictors