Cargando…

On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding

The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences th...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Yun, Hu, Changyu, Dai, Yang, Liang, Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128808/
https://www.ncbi.nlm.nih.gov/pubmed/25110986
http://dx.doi.org/10.1371/journal.pone.0104403
_version_ 1782330177331658752
author Xu, Yun
Hu, Changyu
Dai, Yang
Liang, Jie
author_facet Xu, Yun
Hu, Changyu
Dai, Yang
Liang, Jie
author_sort Xu, Yun
collection PubMed
description The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.
format Online
Article
Text
id pubmed-4128808
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41288082014-08-12 On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding Xu, Yun Hu, Changyu Dai, Yang Liang, Jie PLoS One Research Article The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape. Public Library of Science 2014-08-11 /pmc/articles/PMC4128808/ /pubmed/25110986 http://dx.doi.org/10.1371/journal.pone.0104403 Text en © 2014 Xu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Xu, Yun
Hu, Changyu
Dai, Yang
Liang, Jie
On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding
title On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding
title_full On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding
title_fullStr On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding
title_full_unstemmed On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding
title_short On Simplified Global Nonlinear Function for Fitness Landscape: A Case Study of Inverse Protein Folding
title_sort on simplified global nonlinear function for fitness landscape: a case study of inverse protein folding
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128808/
https://www.ncbi.nlm.nih.gov/pubmed/25110986
http://dx.doi.org/10.1371/journal.pone.0104403
work_keys_str_mv AT xuyun onsimplifiedglobalnonlinearfunctionforfitnesslandscapeacasestudyofinverseproteinfolding
AT huchangyu onsimplifiedglobalnonlinearfunctionforfitnesslandscapeacasestudyofinverseproteinfolding
AT daiyang onsimplifiedglobalnonlinearfunctionforfitnesslandscapeacasestudyofinverseproteinfolding
AT liangjie onsimplifiedglobalnonlinearfunctionforfitnesslandscapeacasestudyofinverseproteinfolding