Cargando…

Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach

In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanis...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Linan, Liu, Pei, Chen, Jialong, Deng, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204005/
https://www.ncbi.nlm.nih.gov/pubmed/35719916
http://dx.doi.org/10.3389/fonc.2022.893520
_version_ 1784728817097506816
author Cao, Linan
Liu, Pei
Chen, Jialong
Deng, Lei
author_facet Cao, Linan
Liu, Pei
Chen, Jialong
Deng, Lei
author_sort Cao, Linan
collection PubMed
description In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanism of cell expression. In recent years, computational and deep learning methods searching for TFBS have become an active field of research. However, existing methods generally cannot meet high performance and interpretability simultaneously. Here, we develop an accurate and interpretable attention-based hybrid approach, DeepARC, that combines a convolutional neural network (CNN) and recurrent neural network (RNN) to predict TFBS. DeepARC employs a positional embedding method to extract the hidden embedding from DNA sequences, including the positional information from OneHot encoding and the distributed embedding from DNA2Vec. DeepARC feeds the positional embedding of the DNA sequence into a CNN-BiLSTM-Attention-based framework to complete the task of finding the motif. Taking advantage of the attention mechanism, DeepARC can gain greater access to valuable information about the motif and bring interpretability to the work of searching for motifs through the attention weight graph. Moreover, DeepARC achieves promising performances with an average area under the receiver operating characteristic curve (AUC) score of 0.908 on five cell lines (A549, GM12878, Hep-G2, H1-hESC, and Hela) in the benchmark dataset. We also compare the positional embedding with OneHot and DNA2Vec and gain a competitive advantage.
format Online
Article
Text
id pubmed-9204005
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92040052022-06-18 Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach Cao, Linan Liu, Pei Chen, Jialong Deng, Lei Front Oncol Oncology In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanism of cell expression. In recent years, computational and deep learning methods searching for TFBS have become an active field of research. However, existing methods generally cannot meet high performance and interpretability simultaneously. Here, we develop an accurate and interpretable attention-based hybrid approach, DeepARC, that combines a convolutional neural network (CNN) and recurrent neural network (RNN) to predict TFBS. DeepARC employs a positional embedding method to extract the hidden embedding from DNA sequences, including the positional information from OneHot encoding and the distributed embedding from DNA2Vec. DeepARC feeds the positional embedding of the DNA sequence into a CNN-BiLSTM-Attention-based framework to complete the task of finding the motif. Taking advantage of the attention mechanism, DeepARC can gain greater access to valuable information about the motif and bring interpretability to the work of searching for motifs through the attention weight graph. Moreover, DeepARC achieves promising performances with an average area under the receiver operating characteristic curve (AUC) score of 0.908 on five cell lines (A549, GM12878, Hep-G2, H1-hESC, and Hela) in the benchmark dataset. We also compare the positional embedding with OneHot and DNA2Vec and gain a competitive advantage. Frontiers Media S.A. 2022-06-03 /pmc/articles/PMC9204005/ /pubmed/35719916 http://dx.doi.org/10.3389/fonc.2022.893520 Text en Copyright © 2022 Cao, Liu, Chen and Deng https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Cao, Linan
Liu, Pei
Chen, Jialong
Deng, Lei
Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
title Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
title_full Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
title_fullStr Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
title_full_unstemmed Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
title_short Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach
title_sort prediction of transcription factor binding sites using a combined deep learning approach
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204005/
https://www.ncbi.nlm.nih.gov/pubmed/35719916
http://dx.doi.org/10.3389/fonc.2022.893520
work_keys_str_mv AT caolinan predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach
AT liupei predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach
AT chenjialong predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach
AT denglei predictionoftranscriptionfactorbindingsitesusingacombineddeeplearningapproach