Cargando…

A maximum likelihood framework for protein design

BACKGROUND: The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kleinman, Claudia L, Rodrigue, Nicolas, Bonnard, Cécile, Philippe, Hervé, Lartillot, Nicolas
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1570151/ https://www.ncbi.nlm.nih.gov/pubmed/16808841 http://dx.doi.org/10.1186/1471-2105-7-326

_version_	1782130250903191552
author	Kleinman, Claudia L Rodrigue, Nicolas Bonnard, Cécile Philippe, Hervé Lartillot, Nicolas
author_facet	Kleinman, Claudia L Rodrigue, Nicolas Bonnard, Cécile Philippe, Hervé Lartillot, Nicolas
author_sort	Kleinman, Claudia L
collection	PubMed
description	BACKGROUND: The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. RESULTS: We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered. CONCLUSION: Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.
format	Text
id	pubmed-1570151
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15701512006-10-02 A maximum likelihood framework for protein design Kleinman, Claudia L Rodrigue, Nicolas Bonnard, Cécile Philippe, Hervé Lartillot, Nicolas BMC Bioinformatics Methodology Article BACKGROUND: The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. RESULTS: We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered. CONCLUSION: Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution. BioMed Central 2006-06-29 /pmc/articles/PMC1570151/ /pubmed/16808841 http://dx.doi.org/10.1186/1471-2105-7-326 Text en Copyright © 2006 Kleinman et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Kleinman, Claudia L Rodrigue, Nicolas Bonnard, Cécile Philippe, Hervé Lartillot, Nicolas A maximum likelihood framework for protein design
title	A maximum likelihood framework for protein design
title_full	A maximum likelihood framework for protein design
title_fullStr	A maximum likelihood framework for protein design
title_full_unstemmed	A maximum likelihood framework for protein design
title_short	A maximum likelihood framework for protein design
title_sort	maximum likelihood framework for protein design
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1570151/ https://www.ncbi.nlm.nih.gov/pubmed/16808841 http://dx.doi.org/10.1186/1471-2105-7-326
work_keys_str_mv	AT kleinmanclaudial amaximumlikelihoodframeworkforproteindesign AT rodriguenicolas amaximumlikelihoodframeworkforproteindesign AT bonnardcecile amaximumlikelihoodframeworkforproteindesign AT philippeherve amaximumlikelihoodframeworkforproteindesign AT lartillotnicolas amaximumlikelihoodframeworkforproteindesign AT kleinmanclaudial maximumlikelihoodframeworkforproteindesign AT rodriguenicolas maximumlikelihoodframeworkforproteindesign AT bonnardcecile maximumlikelihoodframeworkforproteindesign AT philippeherve maximumlikelihoodframeworkforproteindesign AT lartillotnicolas maximumlikelihoodframeworkforproteindesign

A maximum likelihood framework for protein design

Ejemplares similares