Cargando…

Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting

Motivation: A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger–DNA complexes with binding data from 11 mutants, 7 from EGR...

Descripción completa

Detalles Bibliográficos
Autores principales:	Temiz, N. A., Trapp, A., Prokopyev, O. A., Camacho, C. J.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2010
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815656/ https://www.ncbi.nlm.nih.gov/pubmed/19965883 http://dx.doi.org/10.1093/bioinformatics/btp664

_version_	1782177029060296704
author	Temiz, N. A. Trapp, A. Prokopyev, O. A. Camacho, C. J.
author_facet	Temiz, N. A. Trapp, A. Prokopyev, O. A. Camacho, C. J.
author_sort	Temiz, N. A.
collection	PubMed
description	Motivation: A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger–DNA complexes with binding data from 11 mutants, 7 from EGR finger I, was used to define an improved interaction code (no optimization). Here, we present a novel mixed integer programming (MIP)-based method that transforms this type of data into an optimized code, demonstrating both the advantages of the mathematical formulation to minimize over- and under-fitting and the robustness of the underlying physical parameters mapped by the code. Results: Based on the structural models of feasible interaction networks for 35 mutants of EGR–DNA complexes, the MIP method minimizes the cumulative binding energy over all complexes for a general set of fundamental protein–DNA interactions. To guard against over-fitting, we use the scalability of the method to probe against the elimination of related interactions. From an initial set of 12 parameters (six hydrogen bonds, five desolvation penalties and a water factor), we proceed to eliminate five of them with only a marginal reduction of the correlation coefficient to 0.9983. Further reduction of parameters negatively impacts the performance of the code (under-fitting). Besides accurately predicting the change in binding affinity of validation sets, the code identifies possible context-dependent effects in the definition of the interaction networks. Yet, the approach of constraining predictions to within a pre-selected set of interactions limits the impact of these potential errors to related low-affinity complexes. Contact: ccamacho@pitt.edu; droleg@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Text
id	pubmed-2815656
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-28156562010-02-03 Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting Temiz, N. A. Trapp, A. Prokopyev, O. A. Camacho, C. J. Bioinformatics Original Papers Motivation: A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger–DNA complexes with binding data from 11 mutants, 7 from EGR finger I, was used to define an improved interaction code (no optimization). Here, we present a novel mixed integer programming (MIP)-based method that transforms this type of data into an optimized code, demonstrating both the advantages of the mathematical formulation to minimize over- and under-fitting and the robustness of the underlying physical parameters mapped by the code. Results: Based on the structural models of feasible interaction networks for 35 mutants of EGR–DNA complexes, the MIP method minimizes the cumulative binding energy over all complexes for a general set of fundamental protein–DNA interactions. To guard against over-fitting, we use the scalability of the method to probe against the elimination of related interactions. From an initial set of 12 parameters (six hydrogen bonds, five desolvation penalties and a water factor), we proceed to eliminate five of them with only a marginal reduction of the correlation coefficient to 0.9983. Further reduction of parameters negatively impacts the performance of the code (under-fitting). Besides accurately predicting the change in binding affinity of validation sets, the code identifies possible context-dependent effects in the definition of the interaction networks. Yet, the approach of constraining predictions to within a pre-selected set of interactions limits the impact of these potential errors to related low-affinity complexes. Contact: ccamacho@pitt.edu; droleg@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-02-01 2009-12-04 /pmc/articles/PMC2815656/ /pubmed/19965883 http://dx.doi.org/10.1093/bioinformatics/btp664 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Temiz, N. A. Trapp, A. Prokopyev, O. A. Camacho, C. J. Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting
title	Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting
title_full	Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting
title_fullStr	Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting
title_full_unstemmed	Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting
title_short	Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting
title_sort	optimization of minimum set of protein–dna interactions: a quasi exact solution with minimum over-fitting
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815656/ https://www.ncbi.nlm.nih.gov/pubmed/19965883 http://dx.doi.org/10.1093/bioinformatics/btp664
work_keys_str_mv	AT temizna optimizationofminimumsetofproteindnainteractionsaquasiexactsolutionwithminimumoverfitting AT trappa optimizationofminimumsetofproteindnainteractionsaquasiexactsolutionwithminimumoverfitting AT prokopyevoa optimizationofminimumsetofproteindnainteractionsaquasiexactsolutionwithminimumoverfitting AT camachocj optimizationofminimumsetofproteindnainteractionsaquasiexactsolutionwithminimumoverfitting

Optimization of minimum set of protein–DNA interactions: a quasi exact solution with minimum over-fitting

Ejemplares similares