Cargando…

Joint probabilistic-logical refinement of multiple protein feature predictors

BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into des...

Descripción completa

Detalles Bibliográficos
Autores principales:	Teso, Stefano, Passerini, Andrea
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3929554/ https://www.ncbi.nlm.nih.gov/pubmed/24428894 http://dx.doi.org/10.1186/1471-2105-15-16

_version_	1782304406469869568
author	Teso, Stefano Passerini, Andrea
author_facet	Teso, Stefano Passerini, Andrea
author_sort	Teso, Stefano
collection	PubMed
description	BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result. RESULTS: By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85–100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177–W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094–2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods. CONCLUSIONS: The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321–W326, 2004].
format	Online Article Text
id	pubmed-3929554
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39295542014-03-05 Joint probabilistic-logical refinement of multiple protein feature predictors Teso, Stefano Passerini, Andrea BMC Bioinformatics Methodology Article BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result. RESULTS: By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85–100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177–W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094–2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods. CONCLUSIONS: The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321–W326, 2004]. BioMed Central 2014-01-15 /pmc/articles/PMC3929554/ /pubmed/24428894 http://dx.doi.org/10.1186/1471-2105-15-16 Text en Copyright © 2014 Teso and Passerini; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Teso, Stefano Passerini, Andrea Joint probabilistic-logical refinement of multiple protein feature predictors
title	Joint probabilistic-logical refinement of multiple protein feature predictors
title_full	Joint probabilistic-logical refinement of multiple protein feature predictors
title_fullStr	Joint probabilistic-logical refinement of multiple protein feature predictors
title_full_unstemmed	Joint probabilistic-logical refinement of multiple protein feature predictors
title_short	Joint probabilistic-logical refinement of multiple protein feature predictors
title_sort	joint probabilistic-logical refinement of multiple protein feature predictors
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3929554/ https://www.ncbi.nlm.nih.gov/pubmed/24428894 http://dx.doi.org/10.1186/1471-2105-15-16
work_keys_str_mv	AT tesostefano jointprobabilisticlogicalrefinementofmultipleproteinfeaturepredictors AT passeriniandrea jointprobabilisticlogicalrefinementofmultipleproteinfeaturepredictors

Joint probabilistic-logical refinement of multiple protein feature predictors

Ejemplares similares