Cargando…
G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data
G-quadruplexes are non-B-DNA structures that form in the genome facilitated by Hoogsteen bonds between guanines in single or multiple strands of DNA. The functions of G-quadruplexes are linked to various molecular and disease phenotypes, and thus researchers are interested in measuring G-quadruplex...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079223/ https://www.ncbi.nlm.nih.gov/pubmed/36897885 http://dx.doi.org/10.1371/journal.pcbi.1010948 |
_version_ | 1785020686943649792 |
---|---|
author | Barshai, Mira Engel, Barak Haim, Idan Orenstein, Yaron |
author_facet | Barshai, Mira Engel, Barak Haim, Idan Orenstein, Yaron |
author_sort | Barshai, Mira |
collection | PubMed |
description | G-quadruplexes are non-B-DNA structures that form in the genome facilitated by Hoogsteen bonds between guanines in single or multiple strands of DNA. The functions of G-quadruplexes are linked to various molecular and disease phenotypes, and thus researchers are interested in measuring G-quadruplex formation genome-wide. Experimentally measuring G-quadruplexes is a long and laborious process. Computational prediction of G-quadruplex propensity from a given DNA sequence is thus a long-standing challenge. Unfortunately, despite the availability of high-throughput datasets measuring G-quadruplex propensity in the form of mismatch scores, extant methods to predict G-quadruplex formation either rely on small datasets or are based on domain-knowledge rules. We developed G4mismatch, a novel algorithm to accurately and efficiently predict G-quadruplex propensity for any genomic sequence. G4mismatch is based on a convolutional neural network trained on almost 400 millions human genomic loci measured in a single G4-seq experiment. When tested on sequences from a held-out chromosome, G4mismatch, the first method to predict mismatch scores genome-wide, achieved a Pearson correlation of over 0.8. When benchmarked on independent datasets derived from various animal species, G4mismatch trained on human data predicted G-quadruplex propensity genome-wide with high accuracy (Pearson correlations greater than 0.7). Moreover, when tested in detecting G-quadruplexes genome-wide using the predicted mismatch scores, G4mismatch achieved superior performance compared to extant methods. Last, we demonstrate the ability to deduce the mechanism behind G-quadruplex formation by unique visualization of the principles learned by the model. |
format | Online Article Text |
id | pubmed-10079223 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-100792232023-04-07 G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data Barshai, Mira Engel, Barak Haim, Idan Orenstein, Yaron PLoS Comput Biol Research Article G-quadruplexes are non-B-DNA structures that form in the genome facilitated by Hoogsteen bonds between guanines in single or multiple strands of DNA. The functions of G-quadruplexes are linked to various molecular and disease phenotypes, and thus researchers are interested in measuring G-quadruplex formation genome-wide. Experimentally measuring G-quadruplexes is a long and laborious process. Computational prediction of G-quadruplex propensity from a given DNA sequence is thus a long-standing challenge. Unfortunately, despite the availability of high-throughput datasets measuring G-quadruplex propensity in the form of mismatch scores, extant methods to predict G-quadruplex formation either rely on small datasets or are based on domain-knowledge rules. We developed G4mismatch, a novel algorithm to accurately and efficiently predict G-quadruplex propensity for any genomic sequence. G4mismatch is based on a convolutional neural network trained on almost 400 millions human genomic loci measured in a single G4-seq experiment. When tested on sequences from a held-out chromosome, G4mismatch, the first method to predict mismatch scores genome-wide, achieved a Pearson correlation of over 0.8. When benchmarked on independent datasets derived from various animal species, G4mismatch trained on human data predicted G-quadruplex propensity genome-wide with high accuracy (Pearson correlations greater than 0.7). Moreover, when tested in detecting G-quadruplexes genome-wide using the predicted mismatch scores, G4mismatch achieved superior performance compared to extant methods. Last, we demonstrate the ability to deduce the mechanism behind G-quadruplex formation by unique visualization of the principles learned by the model. Public Library of Science 2023-03-10 /pmc/articles/PMC10079223/ /pubmed/36897885 http://dx.doi.org/10.1371/journal.pcbi.1010948 Text en © 2023 Barshai et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Barshai, Mira Engel, Barak Haim, Idan Orenstein, Yaron G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data |
title | G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data |
title_full | G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data |
title_fullStr | G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data |
title_full_unstemmed | G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data |
title_short | G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data |
title_sort | g4mismatch: deep neural networks to predict g-quadruplex propensity based on g4-seq data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079223/ https://www.ncbi.nlm.nih.gov/pubmed/36897885 http://dx.doi.org/10.1371/journal.pcbi.1010948 |
work_keys_str_mv | AT barshaimira g4mismatchdeepneuralnetworkstopredictgquadruplexpropensitybasedong4seqdata AT engelbarak g4mismatchdeepneuralnetworkstopredictgquadruplexpropensitybasedong4seqdata AT haimidan g4mismatchdeepneuralnetworkstopredictgquadruplexpropensitybasedong4seqdata AT orensteinyaron g4mismatchdeepneuralnetworkstopredictgquadruplexpropensitybasedong4seqdata |