Cargando…

Conditional generative modeling for de novo protein design with hierarchical functions

MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, m...

Descripción completa

Detalles Bibliográficos
Autores principales: Kucera, Tim, Togninalli, Matteo, Meng-Papaxanthos, Laetitia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237736/
https://www.ncbi.nlm.nih.gov/pubmed/35639661
http://dx.doi.org/10.1093/bioinformatics/btac353
_version_ 1784736863560400896
author Kucera, Tim
Togninalli, Matteo
Meng-Papaxanthos, Laetitia
author_facet Kucera, Tim
Togninalli, Matteo
Meng-Papaxanthos, Laetitia
author_sort Kucera, Tim
collection PubMed
description MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. RESULTS: Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. SUPPLEMENTARY INFORMATION: Supplemental data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9237736
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92377362022-06-29 Conditional generative modeling for de novo protein design with hierarchical functions Kucera, Tim Togninalli, Matteo Meng-Papaxanthos, Laetitia Bioinformatics Original Papers MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. RESULTS: Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. SUPPLEMENTARY INFORMATION: Supplemental data are available at Bioinformatics online. Oxford University Press 2022-05-26 /pmc/articles/PMC9237736/ /pubmed/35639661 http://dx.doi.org/10.1093/bioinformatics/btac353 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Kucera, Tim
Togninalli, Matteo
Meng-Papaxanthos, Laetitia
Conditional generative modeling for de novo protein design with hierarchical functions
title Conditional generative modeling for de novo protein design with hierarchical functions
title_full Conditional generative modeling for de novo protein design with hierarchical functions
title_fullStr Conditional generative modeling for de novo protein design with hierarchical functions
title_full_unstemmed Conditional generative modeling for de novo protein design with hierarchical functions
title_short Conditional generative modeling for de novo protein design with hierarchical functions
title_sort conditional generative modeling for de novo protein design with hierarchical functions
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237736/
https://www.ncbi.nlm.nih.gov/pubmed/35639661
http://dx.doi.org/10.1093/bioinformatics/btac353
work_keys_str_mv AT kuceratim conditionalgenerativemodelingfordenovoproteindesignwithhierarchicalfunctions
AT togninallimatteo conditionalgenerativemodelingfordenovoproteindesignwithhierarchicalfunctions
AT mengpapaxanthoslaetitia conditionalgenerativemodelingfordenovoproteindesignwithhierarchicalfunctions