Cargando…
Application of fourier transform and proteochemometrics principles to protein engineering
BACKGROUND: Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (in...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191906/ https://www.ncbi.nlm.nih.gov/pubmed/30326841 http://dx.doi.org/10.1186/s12859-018-2407-8 |
_version_ | 1783363804297428992 |
---|---|
author | Cadet, Frédéric Fontaine, Nicolas Vetrivel, Iyanar Ng Fuk Chong, Matthieu Savriama, Olivier Cadet, Xavier Charton, Philippe |
author_facet | Cadet, Frédéric Fontaine, Nicolas Vetrivel, Iyanar Ng Fuk Chong, Matthieu Savriama, Olivier Cadet, Xavier Charton, Philippe |
author_sort | Cadet, Frédéric |
collection | PubMed |
description | BACKGROUND: Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. RESULTS: We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. CONCLUSION: The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2407-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6191906 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61919062018-10-23 Application of fourier transform and proteochemometrics principles to protein engineering Cadet, Frédéric Fontaine, Nicolas Vetrivel, Iyanar Ng Fuk Chong, Matthieu Savriama, Olivier Cadet, Xavier Charton, Philippe BMC Bioinformatics Research Article BACKGROUND: Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. RESULTS: We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. CONCLUSION: The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2407-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-16 /pmc/articles/PMC6191906/ /pubmed/30326841 http://dx.doi.org/10.1186/s12859-018-2407-8 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Cadet, Frédéric Fontaine, Nicolas Vetrivel, Iyanar Ng Fuk Chong, Matthieu Savriama, Olivier Cadet, Xavier Charton, Philippe Application of fourier transform and proteochemometrics principles to protein engineering |
title | Application of fourier transform and proteochemometrics principles to protein engineering |
title_full | Application of fourier transform and proteochemometrics principles to protein engineering |
title_fullStr | Application of fourier transform and proteochemometrics principles to protein engineering |
title_full_unstemmed | Application of fourier transform and proteochemometrics principles to protein engineering |
title_short | Application of fourier transform and proteochemometrics principles to protein engineering |
title_sort | application of fourier transform and proteochemometrics principles to protein engineering |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191906/ https://www.ncbi.nlm.nih.gov/pubmed/30326841 http://dx.doi.org/10.1186/s12859-018-2407-8 |
work_keys_str_mv | AT cadetfrederic applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT fontainenicolas applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT vetriveliyanar applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT ngfukchongmatthieu applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT savriamaolivier applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT cadetxavier applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT chartonphilippe applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering |