Cargando…

A generative model for constructing nucleic acid sequences binding to a protein

BACKGROUND: Interactions between protein and nucleic acid molecules are essential to a variety of cellular processes. A large amount of interaction data generated by high-throughput technologies have triggered the development of several computational methods either to predict binding sites in a sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Im, Jinho, Park, Byungkyu, Han, Kyungsook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933682/
https://www.ncbi.nlm.nih.gov/pubmed/31881936
http://dx.doi.org/10.1186/s12864-019-6299-4
_version_ 1783483257292062720
author Im, Jinho
Park, Byungkyu
Han, Kyungsook
author_facet Im, Jinho
Park, Byungkyu
Han, Kyungsook
author_sort Im, Jinho
collection PubMed
description BACKGROUND: Interactions between protein and nucleic acid molecules are essential to a variety of cellular processes. A large amount of interaction data generated by high-throughput technologies have triggered the development of several computational methods either to predict binding sites in a sequence or to determine whether a pair of sequences interacts or not. Most of these methods treat the problem of the interaction of nucleic acids with proteins as a classification problem rather than a generation problem. RESULTS: We developed a generative model for constructing single-stranded nucleic acids binding to a target protein using a long short-term memory (LSTM) neural network. Experimental results of the generative model are promising in the sense that DNA and RNA sequences generated by the model for several target proteins show high specificity and that motifs present in the generated sequences are similar to known protein-binding motifs. CONCLUSIONS: Although these are preliminary results of our ongoing research, our approach can be used to generate nucleic acid sequences binding to a target protein. In particular, it will help design efficient in vitro experiments by constructing an initial pool of potential aptamers that bind to a target protein with high affinity and specificity.
format Online
Article
Text
id pubmed-6933682
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69336822019-12-30 A generative model for constructing nucleic acid sequences binding to a protein Im, Jinho Park, Byungkyu Han, Kyungsook BMC Genomics Research BACKGROUND: Interactions between protein and nucleic acid molecules are essential to a variety of cellular processes. A large amount of interaction data generated by high-throughput technologies have triggered the development of several computational methods either to predict binding sites in a sequence or to determine whether a pair of sequences interacts or not. Most of these methods treat the problem of the interaction of nucleic acids with proteins as a classification problem rather than a generation problem. RESULTS: We developed a generative model for constructing single-stranded nucleic acids binding to a target protein using a long short-term memory (LSTM) neural network. Experimental results of the generative model are promising in the sense that DNA and RNA sequences generated by the model for several target proteins show high specificity and that motifs present in the generated sequences are similar to known protein-binding motifs. CONCLUSIONS: Although these are preliminary results of our ongoing research, our approach can be used to generate nucleic acid sequences binding to a target protein. In particular, it will help design efficient in vitro experiments by constructing an initial pool of potential aptamers that bind to a target protein with high affinity and specificity. BioMed Central 2019-12-27 /pmc/articles/PMC6933682/ /pubmed/31881936 http://dx.doi.org/10.1186/s12864-019-6299-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Im, Jinho
Park, Byungkyu
Han, Kyungsook
A generative model for constructing nucleic acid sequences binding to a protein
title A generative model for constructing nucleic acid sequences binding to a protein
title_full A generative model for constructing nucleic acid sequences binding to a protein
title_fullStr A generative model for constructing nucleic acid sequences binding to a protein
title_full_unstemmed A generative model for constructing nucleic acid sequences binding to a protein
title_short A generative model for constructing nucleic acid sequences binding to a protein
title_sort generative model for constructing nucleic acid sequences binding to a protein
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933682/
https://www.ncbi.nlm.nih.gov/pubmed/31881936
http://dx.doi.org/10.1186/s12864-019-6299-4
work_keys_str_mv AT imjinho agenerativemodelforconstructingnucleicacidsequencesbindingtoaprotein
AT parkbyungkyu agenerativemodelforconstructingnucleicacidsequencesbindingtoaprotein
AT hankyungsook agenerativemodelforconstructingnucleicacidsequencesbindingtoaprotein
AT imjinho generativemodelforconstructingnucleicacidsequencesbindingtoaprotein
AT parkbyungkyu generativemodelforconstructingnucleicacidsequencesbindingtoaprotein
AT hankyungsook generativemodelforconstructingnucleicacidsequencesbindingtoaprotein