Cargando…
Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers
MOTIVATION: Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing hete...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6748727/ https://www.ncbi.nlm.nih.gov/pubmed/30753279 http://dx.doi.org/10.1093/bioinformatics/btz064 |
_version_ | 1783452140979617792 |
---|---|
author | Mehdi, Tahmid F Singh, Gurdeep Mitchell, Jennifer A Moses, Alan M |
author_facet | Mehdi, Tahmid F Singh, Gurdeep Mitchell, Jennifer A Moses, Alan M |
author_sort | Mehdi, Tahmid F |
collection | PubMed |
description | MOTIVATION: Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm. RESULTS: We implemented a Dirichlet Process Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over features. We derived a novel variational inference algorithm to handle semi-supervised learning tasks where certain observations are forced to cluster together. We applied this model to enhancer candidates in mouse heart tissues based on heterogeneous features. We constrained a small number of known active enhancers to appear in the same cluster, and 47 additional regions clustered with them. Many of these are located near heart-specific genes. The model also predicted 1176 active promoters, suggesting that it can discover new enhancers and promoters. AVAILABILITY AND IMPLEMENTATION: We created the ‘dphmix’ Python package: https://pypi.org/project/dphmix/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6748727 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-67487272019-09-23 Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers Mehdi, Tahmid F Singh, Gurdeep Mitchell, Jennifer A Moses, Alan M Bioinformatics Original Papers MOTIVATION: Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm. RESULTS: We implemented a Dirichlet Process Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over features. We derived a novel variational inference algorithm to handle semi-supervised learning tasks where certain observations are forced to cluster together. We applied this model to enhancer candidates in mouse heart tissues based on heterogeneous features. We constrained a small number of known active enhancers to appear in the same cluster, and 47 additional regions clustered with them. Many of these are located near heart-specific genes. The model also predicted 1176 active promoters, suggesting that it can discover new enhancers and promoters. AVAILABILITY AND IMPLEMENTATION: We created the ‘dphmix’ Python package: https://pypi.org/project/dphmix/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-09-15 2019-02-07 /pmc/articles/PMC6748727/ /pubmed/30753279 http://dx.doi.org/10.1093/bioinformatics/btz064 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Mehdi, Tahmid F Singh, Gurdeep Mitchell, Jennifer A Moses, Alan M Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
title | Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
title_full | Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
title_fullStr | Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
title_full_unstemmed | Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
title_short | Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
title_sort | variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6748727/ https://www.ncbi.nlm.nih.gov/pubmed/30753279 http://dx.doi.org/10.1093/bioinformatics/btz064 |
work_keys_str_mv | AT mehditahmidf variationalinfiniteheterogeneousmixturemodelforsemisupervisedclusteringofheartenhancers AT singhgurdeep variationalinfiniteheterogeneousmixturemodelforsemisupervisedclusteringofheartenhancers AT mitchelljennifera variationalinfiniteheterogeneousmixturemodelforsemisupervisedclusteringofheartenhancers AT mosesalanm variationalinfiniteheterogeneousmixturemodelforsemisupervisedclusteringofheartenhancers |