Cargando…

A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences

Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have...

Descripción completa

Detalles Bibliográficos
Autores principales: Linder, Johannes, Bogard, Nicholas, Rosenberg, Alexander B., Seelig, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8694568/
https://www.ncbi.nlm.nih.gov/pubmed/32711843
http://dx.doi.org/10.1016/j.cels.2020.05.007
_version_ 1784619385021792256
author Linder, Johannes
Bogard, Nicholas
Rosenberg, Alexander B.
Seelig, Georg
author_facet Linder, Johannes
Bogard, Nicholas
Rosenberg, Alexander B.
Seelig, Georg
author_sort Linder, Johannes
collection PubMed
description Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks.
format Online
Article
Text
id pubmed-8694568
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-86945682021-12-22 A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences Linder, Johannes Bogard, Nicholas Rosenberg, Alexander B. Seelig, Georg Cell Syst Article Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks. 2020-06-25 2020-07-22 /pmc/articles/PMC8694568/ /pubmed/32711843 http://dx.doi.org/10.1016/j.cels.2020.05.007 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ).
spellingShingle Article
Linder, Johannes
Bogard, Nicholas
Rosenberg, Alexander B.
Seelig, Georg
A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
title A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
title_full A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
title_fullStr A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
title_full_unstemmed A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
title_short A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
title_sort generative neural network for maximizing fitness and diversity of synthetic dna and protein sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8694568/
https://www.ncbi.nlm.nih.gov/pubmed/32711843
http://dx.doi.org/10.1016/j.cels.2020.05.007
work_keys_str_mv AT linderjohannes agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT bogardnicholas agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT rosenbergalexanderb agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT seeliggeorg agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT linderjohannes generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT bogardnicholas generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT rosenbergalexanderb generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences
AT seeliggeorg generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences