Cargando…

Generating functional protein variants with variational autoencoders

The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically me...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hawkins-Hooker, Alex, Depardieu, Florence, Baur, Sebastien, Couairon, Guillaume, Chen, Arthur, Bikard, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946179/ https://www.ncbi.nlm.nih.gov/pubmed/33635868 http://dx.doi.org/10.1371/journal.pcbi.1008736

_version_	1783662997598633984
author	Hawkins-Hooker, Alex Depardieu, Florence Baur, Sebastien Couairon, Guillaume Chen, Arthur Bikard, David
author_facet	Hawkins-Hooker, Alex Depardieu, Florence Baur, Sebastien Couairon, Guillaume Chen, Arthur Bikard, David
author_sort	Hawkins-Hooker, Alex
collection	PubMed
description	The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.
format	Online Article Text
id	pubmed-7946179
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-79461792021-03-19 Generating functional protein variants with variational autoencoders Hawkins-Hooker, Alex Depardieu, Florence Baur, Sebastien Couairon, Guillaume Chen, Arthur Bikard, David PLoS Comput Biol Research Article The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches. Public Library of Science 2021-02-26 /pmc/articles/PMC7946179/ /pubmed/33635868 http://dx.doi.org/10.1371/journal.pcbi.1008736 Text en © 2021 Hawkins-Hooker et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hawkins-Hooker, Alex Depardieu, Florence Baur, Sebastien Couairon, Guillaume Chen, Arthur Bikard, David Generating functional protein variants with variational autoencoders
title	Generating functional protein variants with variational autoencoders
title_full	Generating functional protein variants with variational autoencoders
title_fullStr	Generating functional protein variants with variational autoencoders
title_full_unstemmed	Generating functional protein variants with variational autoencoders
title_short	Generating functional protein variants with variational autoencoders
title_sort	generating functional protein variants with variational autoencoders
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946179/ https://www.ncbi.nlm.nih.gov/pubmed/33635868 http://dx.doi.org/10.1371/journal.pcbi.1008736
work_keys_str_mv	AT hawkinshookeralex generatingfunctionalproteinvariantswithvariationalautoencoders AT depardieuflorence generatingfunctionalproteinvariantswithvariationalautoencoders AT baursebastien generatingfunctionalproteinvariantswithvariationalautoencoders AT couaironguillaume generatingfunctionalproteinvariantswithvariationalautoencoders AT chenarthur generatingfunctionalproteinvariantswithvariationalautoencoders AT bikarddavid generatingfunctionalproteinvariantswithvariationalautoencoders

Generating functional protein variants with variational autoencoders

Ejemplares similares