Cargando…

Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences

We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding sites. We test the measures on well-studied genomic sequences of different sizes drawn from d...

Descripción completa

Detalles Bibliográficos
Autores principales: Zenil, Hector, Minary, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6846163/
https://www.ncbi.nlm.nih.gov/pubmed/31511887
http://dx.doi.org/10.1093/nar/gkz750
_version_ 1783468827097432064
author Zenil, Hector
Minary, Peter
author_facet Zenil, Hector
Minary, Peter
author_sort Zenil, Hector
collection PubMed
description We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding sites. We test the measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint high and low nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that the complexity indices are informative of nucleosome occupancy. We found that, while it is clear that the gold standard Kaplan model is driven by GC content (by design) and by k-mer training; for high occupancy, entropy and complexity-based scores are also informative and can complement the Kaplan model.
format Online
Article
Text
id pubmed-6846163
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68461632019-11-18 Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences Zenil, Hector Minary, Peter Nucleic Acids Res Methods Online We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding sites. We test the measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint high and low nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that the complexity indices are informative of nucleosome occupancy. We found that, while it is clear that the gold standard Kaplan model is driven by GC content (by design) and by k-mer training; for high occupancy, entropy and complexity-based scores are also informative and can complement the Kaplan model. Oxford University Press 2019-11-18 2019-09-12 /pmc/articles/PMC6846163/ /pubmed/31511887 http://dx.doi.org/10.1093/nar/gkz750 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Zenil, Hector
Minary, Peter
Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences
title Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences
title_full Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences
title_fullStr Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences
title_full_unstemmed Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences
title_short Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences
title_sort training-free measures based on algorithmic probability identify high nucleosome occupancy in dna sequences
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6846163/
https://www.ncbi.nlm.nih.gov/pubmed/31511887
http://dx.doi.org/10.1093/nar/gkz750
work_keys_str_mv AT zenilhector trainingfreemeasuresbasedonalgorithmicprobabilityidentifyhighnucleosomeoccupancyindnasequences
AT minarypeter trainingfreemeasuresbasedonalgorithmicprobabilityidentifyhighnucleosomeoccupancyindnasequences