Cargando…

ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data

MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinc...

Descripción completa

Detalles Bibliográficos
Autores principales: Osmala, Maria, Eraslan, Gökçen, Lähdesmäki, Harri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364382/
https://www.ncbi.nlm.nih.gov/pubmed/35786716
http://dx.doi.org/10.1093/bioinformatics/btac444
_version_ 1784765135360884736
author Osmala, Maria
Eraslan, Gökçen
Lähdesmäki, Harri
author_facet Osmala, Maria
Eraslan, Gökçen
Lähdesmäki, Harri
author_sort Osmala, Maria
collection PubMed
description MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinctive patterns of heterogeneous epigenetic signals at regulatory elements. RESULTS: We introduce ChromDMM, a product Dirichlet-multinomial mixture model for clustering genomic regions that are characterized by multiple chromatin features. ChromDMM extends the mixture model framework by profile shifting and flipping that can probabilistically account for inaccuracies in the position and strand-orientation of the genomic regions. Owing to hyper-parameter optimization, ChromDMM can also regularize the smoothness of the epigenetic profiles across the consecutive genomic regions. With simulated data, we demonstrate that ChromDMM clusters, shifts and strand-orients the profiles more accurately than previous methods. With ENCODE data, we show that the clustering of enhancer regions in the human genome reveals distinct patterns in several chromatin features. We further validate the enhancer clusters by their enrichment for transcriptional regulatory factor binding sites. AVAILABILITY AND IMPLEMENTATION: ChromDMM is implemented as an R package and is available at https://github.com/MariaOsmala/ChromDMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9364382
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93643822022-08-11 ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data Osmala, Maria Eraslan, Gökçen Lähdesmäki, Harri Bioinformatics Original Papers MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinctive patterns of heterogeneous epigenetic signals at regulatory elements. RESULTS: We introduce ChromDMM, a product Dirichlet-multinomial mixture model for clustering genomic regions that are characterized by multiple chromatin features. ChromDMM extends the mixture model framework by profile shifting and flipping that can probabilistically account for inaccuracies in the position and strand-orientation of the genomic regions. Owing to hyper-parameter optimization, ChromDMM can also regularize the smoothness of the epigenetic profiles across the consecutive genomic regions. With simulated data, we demonstrate that ChromDMM clusters, shifts and strand-orients the profiles more accurately than previous methods. With ENCODE data, we show that the clustering of enhancer regions in the human genome reveals distinct patterns in several chromatin features. We further validate the enhancer clusters by their enrichment for transcriptional regulatory factor binding sites. AVAILABILITY AND IMPLEMENTATION: ChromDMM is implemented as an R package and is available at https://github.com/MariaOsmala/ChromDMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-04 /pmc/articles/PMC9364382/ /pubmed/35786716 http://dx.doi.org/10.1093/bioinformatics/btac444 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Osmala, Maria
Eraslan, Gökçen
Lähdesmäki, Harri
ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
title ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
title_full ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
title_fullStr ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
title_full_unstemmed ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
title_short ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
title_sort chromdmm: a dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364382/
https://www.ncbi.nlm.nih.gov/pubmed/35786716
http://dx.doi.org/10.1093/bioinformatics/btac444
work_keys_str_mv AT osmalamaria chromdmmadirichletmultinomialmixturemodelforclusteringheterogeneousepigeneticdata
AT eraslangokcen chromdmmadirichletmultinomialmixturemodelforclusteringheterogeneousepigeneticdata
AT lahdesmakiharri chromdmmadirichletmultinomialmixturemodelforclusteringheterogeneousepigeneticdata