Cargando…
Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanis...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204005/ https://www.ncbi.nlm.nih.gov/pubmed/35719916 http://dx.doi.org/10.3389/fonc.2022.893520 |
_version_ | 1784728817097506816 |
---|---|
author | Cao, Linan Liu, Pei Chen, Jialong Deng, Lei |
author_facet | Cao, Linan Liu, Pei Chen, Jialong Deng, Lei |
author_sort | Cao, Linan |
collection | PubMed |
description | In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanism of cell expression. In recent years, computational and deep learning methods searching for TFBS have become an active field of research. However, existing methods generally cannot meet high performance and interpretability simultaneously. Here, we develop an accurate and interpretable attention-based hybrid approach, DeepARC, that combines a convolutional neural network (CNN) and recurrent neural network (RNN) to predict TFBS. DeepARC employs a positional embedding method to extract the hidden embedding from DNA sequences, including the positional information from OneHot encoding and the distributed embedding from DNA2Vec. DeepARC feeds the positional embedding of the DNA sequence into a CNN-BiLSTM-Attention-based framework to complete the task of finding the motif. Taking advantage of the attention mechanism, DeepARC can gain greater access to valuable information about the motif and bring interpretability to the work of searching for motifs through the attention weight graph. Moreover, DeepARC achieves promising performances with an average area under the receiver operating characteristic curve (AUC) score of 0.908 on five cell lines (A549, GM12878, Hep-G2, H1-hESC, and Hela) in the benchmark dataset. We also compare the positional embedding with OneHot and DNA2Vec and gain a competitive advantage. |
format | Online Article Text |
id | pubmed-9204005 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92040052022-06-18 Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach Cao, Linan Liu, Pei Chen, Jialong Deng, Lei Front Oncol Oncology In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanism of cell expression. In recent years, computational and deep learning methods searching for TFBS have become an active field of research. However, existing methods generally cannot meet high performance and interpretability simultaneously. Here, we develop an accurate and interpretable attention-based hybrid approach, DeepARC, that combines a convolutional neural network (CNN) and recurrent neural network (RNN) to predict TFBS. DeepARC employs a positional embedding method to extract the hidden embedding from DNA sequences, including the positional information from OneHot encoding and the distributed embedding from DNA2Vec. DeepARC feeds the positional embedding of the DNA sequence into a CNN-BiLSTM-Attention-based framework to complete the task of finding the motif. Taking advantage of the attention mechanism, DeepARC can gain greater access to valuable information about the motif and bring interpretability to the work of searching for motifs through the attention weight graph. Moreover, DeepARC achieves promising performances with an average area under the receiver operating characteristic curve (AUC) score of 0.908 on five cell lines (A549, GM12878, Hep-G2, H1-hESC, and Hela) in the benchmark dataset. We also compare the positional embedding with OneHot and DNA2Vec and gain a competitive advantage. Frontiers Media S.A. 2022-06-03 /pmc/articles/PMC9204005/ /pubmed/35719916 http://dx.doi.org/10.3389/fonc.2022.893520 Text en Copyright © 2022 Cao, Liu, Chen and Deng https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Oncology Cao, Linan Liu, Pei Chen, Jialong Deng, Lei Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach |
title | Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach |
title_full | Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach |
title_fullStr | Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach |
title_full_unstemmed | Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach |
title_short | Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach |
title_sort | prediction of transcription factor binding sites using a combined deep learning approach |
topic | Oncology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204005/ https://www.ncbi.nlm.nih.gov/pubmed/35719916 http://dx.doi.org/10.3389/fonc.2022.893520 |
work_keys_str_mv | AT caolinan predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach AT liupei predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach AT chenjialong predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach AT denglei predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach |