Cargando…
ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinc...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364382/ https://www.ncbi.nlm.nih.gov/pubmed/35786716 http://dx.doi.org/10.1093/bioinformatics/btac444 |
_version_ | 1784765135360884736 |
---|---|
author | Osmala, Maria Eraslan, Gökçen Lähdesmäki, Harri |
author_facet | Osmala, Maria Eraslan, Gökçen Lähdesmäki, Harri |
author_sort | Osmala, Maria |
collection | PubMed |
description | MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinctive patterns of heterogeneous epigenetic signals at regulatory elements. RESULTS: We introduce ChromDMM, a product Dirichlet-multinomial mixture model for clustering genomic regions that are characterized by multiple chromatin features. ChromDMM extends the mixture model framework by profile shifting and flipping that can probabilistically account for inaccuracies in the position and strand-orientation of the genomic regions. Owing to hyper-parameter optimization, ChromDMM can also regularize the smoothness of the epigenetic profiles across the consecutive genomic regions. With simulated data, we demonstrate that ChromDMM clusters, shifts and strand-orients the profiles more accurately than previous methods. With ENCODE data, we show that the clustering of enhancer regions in the human genome reveals distinct patterns in several chromatin features. We further validate the enhancer clusters by their enrichment for transcriptional regulatory factor binding sites. AVAILABILITY AND IMPLEMENTATION: ChromDMM is implemented as an R package and is available at https://github.com/MariaOsmala/ChromDMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9364382 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-93643822022-08-11 ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data Osmala, Maria Eraslan, Gökçen Lähdesmäki, Harri Bioinformatics Original Papers MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinctive patterns of heterogeneous epigenetic signals at regulatory elements. RESULTS: We introduce ChromDMM, a product Dirichlet-multinomial mixture model for clustering genomic regions that are characterized by multiple chromatin features. ChromDMM extends the mixture model framework by profile shifting and flipping that can probabilistically account for inaccuracies in the position and strand-orientation of the genomic regions. Owing to hyper-parameter optimization, ChromDMM can also regularize the smoothness of the epigenetic profiles across the consecutive genomic regions. With simulated data, we demonstrate that ChromDMM clusters, shifts and strand-orients the profiles more accurately than previous methods. With ENCODE data, we show that the clustering of enhancer regions in the human genome reveals distinct patterns in several chromatin features. We further validate the enhancer clusters by their enrichment for transcriptional regulatory factor binding sites. AVAILABILITY AND IMPLEMENTATION: ChromDMM is implemented as an R package and is available at https://github.com/MariaOsmala/ChromDMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-04 /pmc/articles/PMC9364382/ /pubmed/35786716 http://dx.doi.org/10.1093/bioinformatics/btac444 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Osmala, Maria Eraslan, Gökçen Lähdesmäki, Harri ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
title | ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
title_full | ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
title_fullStr | ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
title_full_unstemmed | ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
title_short | ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
title_sort | chromdmm: a dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364382/ https://www.ncbi.nlm.nih.gov/pubmed/35786716 http://dx.doi.org/10.1093/bioinformatics/btac444 |
work_keys_str_mv | AT osmalamaria chromdmmadirichletmultinomialmixturemodelforclusteringheterogeneousepigeneticdata AT eraslangokcen chromdmmadirichletmultinomialmixturemodelforclusteringheterogeneousepigeneticdata AT lahdesmakiharri chromdmmadirichletmultinomialmixturemodelforclusteringheterogeneousepigeneticdata |