Cargando…

Combinatorial epigenetic patterns as quantitative predictors of chromatin biology

BACKGROUND: Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent availability of large genome-wide data sets, often comprising several epigenetic marks, novel appr...

Descripción completa

Detalles Bibliográficos
Autores principales: Cieślik, Marcin, Bekiranov, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3922690/
https://www.ncbi.nlm.nih.gov/pubmed/24472558
http://dx.doi.org/10.1186/1471-2164-15-76
_version_ 1782303487577554944
author Cieślik, Marcin
Bekiranov, Stefan
author_facet Cieślik, Marcin
Bekiranov, Stefan
author_sort Cieślik, Marcin
collection PubMed
description BACKGROUND: Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent availability of large genome-wide data sets, often comprising several epigenetic marks, novel approaches are required to explore functionally relevant interactions between histone modifications. Computational discovery of "chromatin states" defined by such combinatorial interactions enabled descriptive annotations of genomes, but more quantitative approaches are needed to progress towards predictive models. RESULTS: We propose non-negative matrix factorization (NMF) as a new unsupervised method to discover combinatorial patterns of epigenetic marks that frequently co-occur in subsets of genomic regions. We show that this small set of combinatorial "codes" can be effectively displayed and interpreted. NMF codes enable dimensionality reduction and have desirable statistical properties for regression and classification tasks. We demonstrate the utility of codes in the quantitative prediction of Pol2-binding and the discrimination between Pol2-bound promoters and enhancers. Finally, we show that specific codes can be linked to molecular pathways and targets of pluripotency genes during differentiation. CONCLUSIONS: We have introduced and evaluated a new computational approach to represent combinatorial patterns of epigenetic marks as quantitative variables suitable for predictive modeling and supervised machine learning. To foster widespread adoption of this method we make it available as an open-source software-package – epicode at https://github.com/mcieslik-mctp/epicode.
format Online
Article
Text
id pubmed-3922690
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39226902014-02-27 Combinatorial epigenetic patterns as quantitative predictors of chromatin biology Cieślik, Marcin Bekiranov, Stefan BMC Genomics Methodology Article BACKGROUND: Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent availability of large genome-wide data sets, often comprising several epigenetic marks, novel approaches are required to explore functionally relevant interactions between histone modifications. Computational discovery of "chromatin states" defined by such combinatorial interactions enabled descriptive annotations of genomes, but more quantitative approaches are needed to progress towards predictive models. RESULTS: We propose non-negative matrix factorization (NMF) as a new unsupervised method to discover combinatorial patterns of epigenetic marks that frequently co-occur in subsets of genomic regions. We show that this small set of combinatorial "codes" can be effectively displayed and interpreted. NMF codes enable dimensionality reduction and have desirable statistical properties for regression and classification tasks. We demonstrate the utility of codes in the quantitative prediction of Pol2-binding and the discrimination between Pol2-bound promoters and enhancers. Finally, we show that specific codes can be linked to molecular pathways and targets of pluripotency genes during differentiation. CONCLUSIONS: We have introduced and evaluated a new computational approach to represent combinatorial patterns of epigenetic marks as quantitative variables suitable for predictive modeling and supervised machine learning. To foster widespread adoption of this method we make it available as an open-source software-package – epicode at https://github.com/mcieslik-mctp/epicode. BioMed Central 2014-01-28 /pmc/articles/PMC3922690/ /pubmed/24472558 http://dx.doi.org/10.1186/1471-2164-15-76 Text en Copyright © 2014 Cieślik and Bekiranov; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Cieślik, Marcin
Bekiranov, Stefan
Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
title Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
title_full Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
title_fullStr Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
title_full_unstemmed Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
title_short Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
title_sort combinatorial epigenetic patterns as quantitative predictors of chromatin biology
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3922690/
https://www.ncbi.nlm.nih.gov/pubmed/24472558
http://dx.doi.org/10.1186/1471-2164-15-76
work_keys_str_mv AT cieslikmarcin combinatorialepigeneticpatternsasquantitativepredictorsofchromatinbiology
AT bekiranovstefan combinatorialepigeneticpatternsasquantitativepredictorsofchromatinbiology