Cargando…

A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features

Technological advances have lead to the creation of large epigenetic datasets, including information about DNA binding proteins and DNA spatial structure. Hi-C experiments have revealed that chromosomes are subdivided into sets of self-interacting domains called Topologically Associating Domains (TA...

Descripción completa

Detalles Bibliográficos
Autores principales: Rozenwald, Michal B., Galitsyna, Aleksandra A., Sapunov, Grigory V., Khrameeva, Ekaterina E., Gelfand, Mikhail S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924456/
https://www.ncbi.nlm.nih.gov/pubmed/33816958
http://dx.doi.org/10.7717/peerj-cs.307
_version_ 1783659093828829184
author Rozenwald, Michal B.
Galitsyna, Aleksandra A.
Sapunov, Grigory V.
Khrameeva, Ekaterina E.
Gelfand, Mikhail S.
author_facet Rozenwald, Michal B.
Galitsyna, Aleksandra A.
Sapunov, Grigory V.
Khrameeva, Ekaterina E.
Gelfand, Mikhail S.
author_sort Rozenwald, Michal B.
collection PubMed
description Technological advances have lead to the creation of large epigenetic datasets, including information about DNA binding proteins and DNA spatial structure. Hi-C experiments have revealed that chromosomes are subdivided into sets of self-interacting domains called Topologically Associating Domains (TADs). TADs are involved in the regulation of gene expression activity, but the mechanisms of their formation are not yet fully understood. Here, we focus on machine learning methods to characterize DNA folding patterns in Drosophila based on chromatin marks across three cell lines. We present linear regression models with four types of regularization, gradient boosting, and recurrent neural networks (RNN) as tools to study chromatin folding characteristics associated with TADs given epigenetic chromatin immunoprecipitation data. The bidirectional long short-term memory RNN architecture produced the best prediction scores and identified biologically relevant features. Distribution of protein Chriz (Chromator) and histone modification H3K4me3 were selected as the most informative features for the prediction of TADs characteristics. This approach may be adapted to any similar biological dataset of chromatin features across various cell lines and species. The code for the implemented pipeline, Hi-ChiP-ML, is publicly available: https://github.com/MichalRozenwald/Hi-ChIP-ML
format Online
Article
Text
id pubmed-7924456
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244562021-04-02 A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features Rozenwald, Michal B. Galitsyna, Aleksandra A. Sapunov, Grigory V. Khrameeva, Ekaterina E. Gelfand, Mikhail S. PeerJ Comput Sci Bioinformatics Technological advances have lead to the creation of large epigenetic datasets, including information about DNA binding proteins and DNA spatial structure. Hi-C experiments have revealed that chromosomes are subdivided into sets of self-interacting domains called Topologically Associating Domains (TADs). TADs are involved in the regulation of gene expression activity, but the mechanisms of their formation are not yet fully understood. Here, we focus on machine learning methods to characterize DNA folding patterns in Drosophila based on chromatin marks across three cell lines. We present linear regression models with four types of regularization, gradient boosting, and recurrent neural networks (RNN) as tools to study chromatin folding characteristics associated with TADs given epigenetic chromatin immunoprecipitation data. The bidirectional long short-term memory RNN architecture produced the best prediction scores and identified biologically relevant features. Distribution of protein Chriz (Chromator) and histone modification H3K4me3 were selected as the most informative features for the prediction of TADs characteristics. This approach may be adapted to any similar biological dataset of chromatin features across various cell lines and species. The code for the implemented pipeline, Hi-ChiP-ML, is publicly available: https://github.com/MichalRozenwald/Hi-ChIP-ML PeerJ Inc. 2020-11-30 /pmc/articles/PMC7924456/ /pubmed/33816958 http://dx.doi.org/10.7717/peerj-cs.307 Text en ©2020 Rozenwald et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Rozenwald, Michal B.
Galitsyna, Aleksandra A.
Sapunov, Grigory V.
Khrameeva, Ekaterina E.
Gelfand, Mikhail S.
A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
title A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
title_full A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
title_fullStr A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
title_full_unstemmed A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
title_short A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
title_sort machine learning framework for the prediction of chromatin folding in drosophila using epigenetic features
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924456/
https://www.ncbi.nlm.nih.gov/pubmed/33816958
http://dx.doi.org/10.7717/peerj-cs.307
work_keys_str_mv AT rozenwaldmichalb amachinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT galitsynaaleksandraa amachinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT sapunovgrigoryv amachinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT khrameevaekaterinae amachinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT gelfandmikhails amachinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT rozenwaldmichalb machinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT galitsynaaleksandraa machinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT sapunovgrigoryv machinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT khrameevaekaterinae machinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures
AT gelfandmikhails machinelearningframeworkforthepredictionofchromatinfoldingindrosophilausingepigeneticfeatures