Cargando…
A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences
Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8694568/ https://www.ncbi.nlm.nih.gov/pubmed/32711843 http://dx.doi.org/10.1016/j.cels.2020.05.007 |
_version_ | 1784619385021792256 |
---|---|
author | Linder, Johannes Bogard, Nicholas Rosenberg, Alexander B. Seelig, Georg |
author_facet | Linder, Johannes Bogard, Nicholas Rosenberg, Alexander B. Seelig, Georg |
author_sort | Linder, Johannes |
collection | PubMed |
description | Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks. |
format | Online Article Text |
id | pubmed-8694568 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-86945682021-12-22 A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences Linder, Johannes Bogard, Nicholas Rosenberg, Alexander B. Seelig, Georg Cell Syst Article Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Deep neural network models, together with gradient ascent-style optimization, show promise for sequence design. The generated sequences can however get stuck in local minima and often have low diversity. Here, we develop deep exploration networks (DENs), a class of activation-maximizing generative models, which minimize the cost of a neural network fitness predictor by gradient descent. By penalizing any two generated patterns on the basis of a similarity metric, DENs explicitly maximize sequence diversity. To avoid drifting into low-confidence regions of the predictor, we incorporate variational autoencoders to maintain the likelihood ratio of generated sequences. Using DENs, we engineered polyadenylation signals with more than 10-fold higher selection odds than the best gradient ascent-generated patterns, identified splice regulatory sequences predicted to result in highly differential splicing between cell lines, and improved on state-of-the-art results for protein design tasks. 2020-06-25 2020-07-22 /pmc/articles/PMC8694568/ /pubmed/32711843 http://dx.doi.org/10.1016/j.cels.2020.05.007 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ). |
spellingShingle | Article Linder, Johannes Bogard, Nicholas Rosenberg, Alexander B. Seelig, Georg A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences |
title | A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences |
title_full | A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences |
title_fullStr | A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences |
title_full_unstemmed | A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences |
title_short | A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences |
title_sort | generative neural network for maximizing fitness and diversity of synthetic dna and protein sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8694568/ https://www.ncbi.nlm.nih.gov/pubmed/32711843 http://dx.doi.org/10.1016/j.cels.2020.05.007 |
work_keys_str_mv | AT linderjohannes agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT bogardnicholas agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT rosenbergalexanderb agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT seeliggeorg agenerativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT linderjohannes generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT bogardnicholas generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT rosenbergalexanderb generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences AT seeliggeorg generativeneuralnetworkformaximizingfitnessanddiversityofsyntheticdnaandproteinsequences |