Cargando…
Conditional generative modeling for de novo protein design with hierarchical functions
MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, m...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237736/ https://www.ncbi.nlm.nih.gov/pubmed/35639661 http://dx.doi.org/10.1093/bioinformatics/btac353 |
_version_ | 1784736863560400896 |
---|---|
author | Kucera, Tim Togninalli, Matteo Meng-Papaxanthos, Laetitia |
author_facet | Kucera, Tim Togninalli, Matteo Meng-Papaxanthos, Laetitia |
author_sort | Kucera, Tim |
collection | PubMed |
description | MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. RESULTS: Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. SUPPLEMENTARY INFORMATION: Supplemental data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9237736 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92377362022-06-29 Conditional generative modeling for de novo protein design with hierarchical functions Kucera, Tim Togninalli, Matteo Meng-Papaxanthos, Laetitia Bioinformatics Original Papers MOTIVATION: Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. RESULTS: Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. SUPPLEMENTARY INFORMATION: Supplemental data are available at Bioinformatics online. Oxford University Press 2022-05-26 /pmc/articles/PMC9237736/ /pubmed/35639661 http://dx.doi.org/10.1093/bioinformatics/btac353 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Kucera, Tim Togninalli, Matteo Meng-Papaxanthos, Laetitia Conditional generative modeling for de novo protein design with hierarchical functions |
title | Conditional generative modeling for de novo protein design with hierarchical functions |
title_full | Conditional generative modeling for de novo protein design with hierarchical functions |
title_fullStr | Conditional generative modeling for de novo protein design with hierarchical functions |
title_full_unstemmed | Conditional generative modeling for de novo protein design with hierarchical functions |
title_short | Conditional generative modeling for de novo protein design with hierarchical functions |
title_sort | conditional generative modeling for de novo protein design with hierarchical functions |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237736/ https://www.ncbi.nlm.nih.gov/pubmed/35639661 http://dx.doi.org/10.1093/bioinformatics/btac353 |
work_keys_str_mv | AT kuceratim conditionalgenerativemodelingfordenovoproteindesignwithhierarchicalfunctions AT togninallimatteo conditionalgenerativemodelingfordenovoproteindesignwithhierarchicalfunctions AT mengpapaxanthoslaetitia conditionalgenerativemodelingfordenovoproteindesignwithhierarchicalfunctions |