Cargando…
Dirichlet Diffusion Score Model for Biological Sequence Generation
Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differe...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cornell University
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246113/ https://www.ncbi.nlm.nih.gov/pubmed/37292476 |
_version_ | 1785054980404674560 |
---|---|
author | Avdeyev, Pavel Shi, Chenlai Tan, Yuhao Dudnyk, Kseniia Zhou, Jian |
author_facet | Avdeyev, Pavel Shi, Chenlai Tan, Yuhao Dudnyk, Kseniia Zhou, Jian |
author_sort | Avdeyev, Pavel |
collection | PubMed |
description | Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits, but the originally proposed SDEs are not naturally designed for modeling discrete data. To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. This makes diffusion in continuous space natural for modeling discrete data. We refer to this approach as Dirchlet diffusion score model. We demonstrate that this technique can generate samples that satisfy hard constraints using a Sudoku generation task. This generative model can also solve Sudoku, including hard puzzles, without additional training. Finally, we applied this approach to develop the first human promoter DNA sequence design model and showed that designed sequences share similar properties with natural promoter sequences. |
format | Online Article Text |
id | pubmed-10246113 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cornell University |
record_format | MEDLINE/PubMed |
spelling | pubmed-102461132023-06-08 Dirichlet Diffusion Score Model for Biological Sequence Generation Avdeyev, Pavel Shi, Chenlai Tan, Yuhao Dudnyk, Kseniia Zhou, Jian ArXiv Article Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits, but the originally proposed SDEs are not naturally designed for modeling discrete data. To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. This makes diffusion in continuous space natural for modeling discrete data. We refer to this approach as Dirchlet diffusion score model. We demonstrate that this technique can generate samples that satisfy hard constraints using a Sudoku generation task. This generative model can also solve Sudoku, including hard puzzles, without additional training. Finally, we applied this approach to develop the first human promoter DNA sequence design model and showed that designed sequences share similar properties with natural promoter sequences. Cornell University 2023-06-16 /pmc/articles/PMC10246113/ /pubmed/37292476 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Avdeyev, Pavel Shi, Chenlai Tan, Yuhao Dudnyk, Kseniia Zhou, Jian Dirichlet Diffusion Score Model for Biological Sequence Generation |
title | Dirichlet Diffusion Score Model for Biological Sequence Generation |
title_full | Dirichlet Diffusion Score Model for Biological Sequence Generation |
title_fullStr | Dirichlet Diffusion Score Model for Biological Sequence Generation |
title_full_unstemmed | Dirichlet Diffusion Score Model for Biological Sequence Generation |
title_short | Dirichlet Diffusion Score Model for Biological Sequence Generation |
title_sort | dirichlet diffusion score model for biological sequence generation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246113/ https://www.ncbi.nlm.nih.gov/pubmed/37292476 |
work_keys_str_mv | AT avdeyevpavel dirichletdiffusionscoremodelforbiologicalsequencegeneration AT shichenlai dirichletdiffusionscoremodelforbiologicalsequencegeneration AT tanyuhao dirichletdiffusionscoremodelforbiologicalsequencegeneration AT dudnykkseniia dirichletdiffusionscoremodelforbiologicalsequencegeneration AT zhoujian dirichletdiffusionscoremodelforbiologicalsequencegeneration |