Cargando…

iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength

As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insigh...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Runtao, Wu, Feng, Zhang, Chengjin, Zhang, Lina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036415/
https://www.ncbi.nlm.nih.gov/pubmed/33808317
http://dx.doi.org/10.3390/ijms22073589
_version_ 1783676904785575936
author Yang, Runtao
Wu, Feng
Zhang, Chengjin
Zhang, Lina
author_facet Yang, Runtao
Wu, Feng
Zhang, Chengjin
Zhang, Lina
author_sort Yang, Runtao
collection PubMed
description As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a “word” in linguistics, the word segmentation methods are proposed to divide DNA sequences into “words”, and the skip-gram model is employed to transform the “words” into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract “words” from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.
format Online
Article
Text
id pubmed-8036415
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80364152021-04-12 iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength Yang, Runtao Wu, Feng Zhang, Chengjin Zhang, Lina Int J Mol Sci Article As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a “word” in linguistics, the word segmentation methods are proposed to divide DNA sequences into “words”, and the skip-gram model is employed to transform the “words” into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract “words” from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields. MDPI 2021-03-30 /pmc/articles/PMC8036415/ /pubmed/33808317 http://dx.doi.org/10.3390/ijms22073589 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Runtao
Wu, Feng
Zhang, Chengjin
Zhang, Lina
iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_full iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_fullStr iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_full_unstemmed iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_short iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_sort ienhancer-gan: a deep learning framework in combination with word embedding and sequence generative adversarial net to identify enhancers and their strength
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036415/
https://www.ncbi.nlm.nih.gov/pubmed/33808317
http://dx.doi.org/10.3390/ijms22073589
work_keys_str_mv AT yangruntao ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength
AT wufeng ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength
AT zhangchengjin ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength
AT zhanglina ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength