Cargando…

Application of fourier transform and proteochemometrics principles to protein engineering

BACKGROUND: Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (in...

Descripción completa

Detalles Bibliográficos
Autores principales: Cadet, Frédéric, Fontaine, Nicolas, Vetrivel, Iyanar, Ng Fuk Chong, Matthieu, Savriama, Olivier, Cadet, Xavier, Charton, Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191906/
https://www.ncbi.nlm.nih.gov/pubmed/30326841
http://dx.doi.org/10.1186/s12859-018-2407-8
_version_ 1783363804297428992
author Cadet, Frédéric
Fontaine, Nicolas
Vetrivel, Iyanar
Ng Fuk Chong, Matthieu
Savriama, Olivier
Cadet, Xavier
Charton, Philippe
author_facet Cadet, Frédéric
Fontaine, Nicolas
Vetrivel, Iyanar
Ng Fuk Chong, Matthieu
Savriama, Olivier
Cadet, Xavier
Charton, Philippe
author_sort Cadet, Frédéric
collection PubMed
description BACKGROUND: Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. RESULTS: We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. CONCLUSION: The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2407-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6191906
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61919062018-10-23 Application of fourier transform and proteochemometrics principles to protein engineering Cadet, Frédéric Fontaine, Nicolas Vetrivel, Iyanar Ng Fuk Chong, Matthieu Savriama, Olivier Cadet, Xavier Charton, Philippe BMC Bioinformatics Research Article BACKGROUND: Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. RESULTS: We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. CONCLUSION: The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2407-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-16 /pmc/articles/PMC6191906/ /pubmed/30326841 http://dx.doi.org/10.1186/s12859-018-2407-8 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Cadet, Frédéric
Fontaine, Nicolas
Vetrivel, Iyanar
Ng Fuk Chong, Matthieu
Savriama, Olivier
Cadet, Xavier
Charton, Philippe
Application of fourier transform and proteochemometrics principles to protein engineering
title Application of fourier transform and proteochemometrics principles to protein engineering
title_full Application of fourier transform and proteochemometrics principles to protein engineering
title_fullStr Application of fourier transform and proteochemometrics principles to protein engineering
title_full_unstemmed Application of fourier transform and proteochemometrics principles to protein engineering
title_short Application of fourier transform and proteochemometrics principles to protein engineering
title_sort application of fourier transform and proteochemometrics principles to protein engineering
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191906/
https://www.ncbi.nlm.nih.gov/pubmed/30326841
http://dx.doi.org/10.1186/s12859-018-2407-8
work_keys_str_mv AT cadetfrederic applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT fontainenicolas applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT vetriveliyanar applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT ngfukchongmatthieu applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT savriamaolivier applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT cadetxavier applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT chartonphilippe applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering