Cargando…

CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides

BACKGROUND: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs i...

Descripción completa

Detalles Bibliográficos
Autores principales: Ouellet, Sébastien, Ferguson, Larissa, Lau, Angus Z., Lim, Tony K. Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189939/
https://www.ncbi.nlm.nih.gov/pubmed/37193950
http://dx.doi.org/10.1186/s12859-023-05327-8
_version_ 1785043188762804224
author Ouellet, Sébastien
Ferguson, Larissa
Lau, Angus Z.
Lim, Tony K. Y.
author_facet Ouellet, Sébastien
Ferguson, Larissa
Lau, Angus Z.
Lim, Tony K. Y.
author_sort Ouellet, Sébastien
collection PubMed
description BACKGROUND: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs is a challenge. Recent advances have made the recombinant expression of CDPs a viable alternative to chemical synthesis. Moreover, identifying CDPs that can be expressed in mammalian cells is crucial in predicting their compatibility with gene therapy and mRNA therapy. Currently, we lack the ability to identify CDPs that will express recombinantly in mammalian cells without labour intensive experimentation. To address this, we developed CysPresso, a novel machine learning model that predicts recombinant expression of CDPs based on primary sequence. RESULTS: We tested various protein representations generated by deep learning algorithms (SeqVec, proteInfer, AlphaFold2) for their suitability in predicting CDP expression and found that AlphaFold2 representations possessed the best predictive features. We then optimized the model by concatenation of AlphaFold2 representations, time series transformation with random convolutional kernels, and dataset partitioning. CONCLUSION: Our novel model, CysPresso, is the first to successfully predict recombinant CDP expression in mammalian cells and is particularly well suited for predicting recombinant expression of knottin peptides. When preprocessing the deep learning protein representation for supervised machine learning, we found that random convolutional kernel transformation preserves more pertinent information relevant for predicting expressibility than embedding averaging. Our study showcases the applicability of deep learning-based protein representations, such as those provided by AlphaFold2, in tasks beyond structure prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05327-8.
format Online
Article
Text
id pubmed-10189939
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101899392023-05-18 CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides Ouellet, Sébastien Ferguson, Larissa Lau, Angus Z. Lim, Tony K. Y. BMC Bioinformatics Research BACKGROUND: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs is a challenge. Recent advances have made the recombinant expression of CDPs a viable alternative to chemical synthesis. Moreover, identifying CDPs that can be expressed in mammalian cells is crucial in predicting their compatibility with gene therapy and mRNA therapy. Currently, we lack the ability to identify CDPs that will express recombinantly in mammalian cells without labour intensive experimentation. To address this, we developed CysPresso, a novel machine learning model that predicts recombinant expression of CDPs based on primary sequence. RESULTS: We tested various protein representations generated by deep learning algorithms (SeqVec, proteInfer, AlphaFold2) for their suitability in predicting CDP expression and found that AlphaFold2 representations possessed the best predictive features. We then optimized the model by concatenation of AlphaFold2 representations, time series transformation with random convolutional kernels, and dataset partitioning. CONCLUSION: Our novel model, CysPresso, is the first to successfully predict recombinant CDP expression in mammalian cells and is particularly well suited for predicting recombinant expression of knottin peptides. When preprocessing the deep learning protein representation for supervised machine learning, we found that random convolutional kernel transformation preserves more pertinent information relevant for predicting expressibility than embedding averaging. Our study showcases the applicability of deep learning-based protein representations, such as those provided by AlphaFold2, in tasks beyond structure prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05327-8. BioMed Central 2023-05-16 /pmc/articles/PMC10189939/ /pubmed/37193950 http://dx.doi.org/10.1186/s12859-023-05327-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ouellet, Sébastien
Ferguson, Larissa
Lau, Angus Z.
Lim, Tony K. Y.
CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
title CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
title_full CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
title_fullStr CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
title_full_unstemmed CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
title_short CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
title_sort cyspresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189939/
https://www.ncbi.nlm.nih.gov/pubmed/37193950
http://dx.doi.org/10.1186/s12859-023-05327-8
work_keys_str_mv AT ouelletsebastien cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides
AT fergusonlarissa cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides
AT lauangusz cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides
AT limtonyky cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides