Cargando…

DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions

DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in pa...

Descripción completa

Detalles Bibliográficos
Autores principales: Rocher, Vincent, Genais, Matthieu, Nassereddine, Elissar, Mourad, Raphael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8384162/
https://www.ncbi.nlm.nih.gov/pubmed/34383754
http://dx.doi.org/10.1371/journal.pcbi.1009308
_version_ 1783741860506763264
author Rocher, Vincent
Genais, Matthieu
Nassereddine, Elissar
Mourad, Raphael
author_facet Rocher, Vincent
Genais, Matthieu
Nassereddine, Elissar
Mourad, Raphael
author_sort Rocher, Vincent
collection PubMed
description DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Recently, new sequencing techniques were developed to map G4s in vitro (G4-seq) and G4s in vivo (G4 ChIP-seq) at few hundred base resolution. Here, we propose a novel convolutional neural network (DeepG4) to map cell-type specific active G4 regions (e.g. regions within which G4s form both in vitro and in vivo). DeepG4 is very accurate to predict active G4 regions in different cell types. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 region activity. We found that such motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4 regions are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. In addition, we used DeepG4 to predict active G4 regions in a large number of tissues and cancers, thereby providing a comprehensive resource for researchers. Availability: https://github.com/morphos30/DeepG4.
format Online
Article
Text
id pubmed-8384162
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-83841622021-08-25 DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions Rocher, Vincent Genais, Matthieu Nassereddine, Elissar Mourad, Raphael PLoS Comput Biol Research Article DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Recently, new sequencing techniques were developed to map G4s in vitro (G4-seq) and G4s in vivo (G4 ChIP-seq) at few hundred base resolution. Here, we propose a novel convolutional neural network (DeepG4) to map cell-type specific active G4 regions (e.g. regions within which G4s form both in vitro and in vivo). DeepG4 is very accurate to predict active G4 regions in different cell types. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 region activity. We found that such motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4 regions are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. In addition, we used DeepG4 to predict active G4 regions in a large number of tissues and cancers, thereby providing a comprehensive resource for researchers. Availability: https://github.com/morphos30/DeepG4. Public Library of Science 2021-08-12 /pmc/articles/PMC8384162/ /pubmed/34383754 http://dx.doi.org/10.1371/journal.pcbi.1009308 Text en © 2021 Rocher et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rocher, Vincent
Genais, Matthieu
Nassereddine, Elissar
Mourad, Raphael
DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
title DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
title_full DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
title_fullStr DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
title_full_unstemmed DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
title_short DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
title_sort deepg4: a deep learning approach to predict cell-type specific active g-quadruplex regions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8384162/
https://www.ncbi.nlm.nih.gov/pubmed/34383754
http://dx.doi.org/10.1371/journal.pcbi.1009308
work_keys_str_mv AT rochervincent deepg4adeeplearningapproachtopredictcelltypespecificactivegquadruplexregions
AT genaismatthieu deepg4adeeplearningapproachtopredictcelltypespecificactivegquadruplexregions
AT nassereddineelissar deepg4adeeplearningapproachtopredictcelltypespecificactivegquadruplexregions
AT mouradraphael deepg4adeeplearningapproachtopredictcelltypespecificactivegquadruplexregions