Cargando…

Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain....

Descripción completa

Detalles Bibliográficos
Autores principales: Fontaine, Nicolas T., Cadet, Xavier F., Vetrivel, Iyanar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888668/
https://www.ncbi.nlm.nih.gov/pubmed/31718061
http://dx.doi.org/10.3390/ijms20225640
_version_ 1783475283491291136
author Fontaine, Nicolas T.
Cadet, Xavier F.
Vetrivel, Iyanar
author_facet Fontaine, Nicolas T.
Cadet, Xavier F.
Vetrivel, Iyanar
author_sort Fontaine, Nicolas T.
collection PubMed
description The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino acids within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.
format Online
Article
Text
id pubmed-6888668
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-68886682019-12-09 Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study Fontaine, Nicolas T. Cadet, Xavier F. Vetrivel, Iyanar Int J Mol Sci Article The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino acids within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications. MDPI 2019-11-11 /pmc/articles/PMC6888668/ /pubmed/31718061 http://dx.doi.org/10.3390/ijms20225640 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fontaine, Nicolas T.
Cadet, Xavier F.
Vetrivel, Iyanar
Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_full Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_fullStr Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_full_unstemmed Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_short Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_sort novel descriptors and digital signal processing- based method for protein sequence activity relationship study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888668/
https://www.ncbi.nlm.nih.gov/pubmed/31718061
http://dx.doi.org/10.3390/ijms20225640
work_keys_str_mv AT fontainenicolast noveldescriptorsanddigitalsignalprocessingbasedmethodforproteinsequenceactivityrelationshipstudy
AT cadetxavierf noveldescriptorsanddigitalsignalprocessingbasedmethodforproteinsequenceactivityrelationshipstudy
AT vetriveliyanar noveldescriptorsanddigitalsignalprocessingbasedmethodforproteinsequenceactivityrelationshipstudy