Cargando…
CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides
BACKGROUND: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs i...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189939/ https://www.ncbi.nlm.nih.gov/pubmed/37193950 http://dx.doi.org/10.1186/s12859-023-05327-8 |
_version_ | 1785043188762804224 |
---|---|
author | Ouellet, Sébastien Ferguson, Larissa Lau, Angus Z. Lim, Tony K. Y. |
author_facet | Ouellet, Sébastien Ferguson, Larissa Lau, Angus Z. Lim, Tony K. Y. |
author_sort | Ouellet, Sébastien |
collection | PubMed |
description | BACKGROUND: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs is a challenge. Recent advances have made the recombinant expression of CDPs a viable alternative to chemical synthesis. Moreover, identifying CDPs that can be expressed in mammalian cells is crucial in predicting their compatibility with gene therapy and mRNA therapy. Currently, we lack the ability to identify CDPs that will express recombinantly in mammalian cells without labour intensive experimentation. To address this, we developed CysPresso, a novel machine learning model that predicts recombinant expression of CDPs based on primary sequence. RESULTS: We tested various protein representations generated by deep learning algorithms (SeqVec, proteInfer, AlphaFold2) for their suitability in predicting CDP expression and found that AlphaFold2 representations possessed the best predictive features. We then optimized the model by concatenation of AlphaFold2 representations, time series transformation with random convolutional kernels, and dataset partitioning. CONCLUSION: Our novel model, CysPresso, is the first to successfully predict recombinant CDP expression in mammalian cells and is particularly well suited for predicting recombinant expression of knottin peptides. When preprocessing the deep learning protein representation for supervised machine learning, we found that random convolutional kernel transformation preserves more pertinent information relevant for predicting expressibility than embedding averaging. Our study showcases the applicability of deep learning-based protein representations, such as those provided by AlphaFold2, in tasks beyond structure prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05327-8. |
format | Online Article Text |
id | pubmed-10189939 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-101899392023-05-18 CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides Ouellet, Sébastien Ferguson, Larissa Lau, Angus Z. Lim, Tony K. Y. BMC Bioinformatics Research BACKGROUND: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs is a challenge. Recent advances have made the recombinant expression of CDPs a viable alternative to chemical synthesis. Moreover, identifying CDPs that can be expressed in mammalian cells is crucial in predicting their compatibility with gene therapy and mRNA therapy. Currently, we lack the ability to identify CDPs that will express recombinantly in mammalian cells without labour intensive experimentation. To address this, we developed CysPresso, a novel machine learning model that predicts recombinant expression of CDPs based on primary sequence. RESULTS: We tested various protein representations generated by deep learning algorithms (SeqVec, proteInfer, AlphaFold2) for their suitability in predicting CDP expression and found that AlphaFold2 representations possessed the best predictive features. We then optimized the model by concatenation of AlphaFold2 representations, time series transformation with random convolutional kernels, and dataset partitioning. CONCLUSION: Our novel model, CysPresso, is the first to successfully predict recombinant CDP expression in mammalian cells and is particularly well suited for predicting recombinant expression of knottin peptides. When preprocessing the deep learning protein representation for supervised machine learning, we found that random convolutional kernel transformation preserves more pertinent information relevant for predicting expressibility than embedding averaging. Our study showcases the applicability of deep learning-based protein representations, such as those provided by AlphaFold2, in tasks beyond structure prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05327-8. BioMed Central 2023-05-16 /pmc/articles/PMC10189939/ /pubmed/37193950 http://dx.doi.org/10.1186/s12859-023-05327-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ouellet, Sébastien Ferguson, Larissa Lau, Angus Z. Lim, Tony K. Y. CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
title | CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
title_full | CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
title_fullStr | CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
title_full_unstemmed | CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
title_short | CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
title_sort | cyspresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189939/ https://www.ncbi.nlm.nih.gov/pubmed/37193950 http://dx.doi.org/10.1186/s12859-023-05327-8 |
work_keys_str_mv | AT ouelletsebastien cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides AT fergusonlarissa cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides AT lauangusz cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides AT limtonyky cyspressoaclassificationmodelutilizingdeeplearningproteinrepresentationstopredictrecombinantexpressionofcysteinedensepeptides |