Cargando…

OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning

Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ferré, Quentin, Capponi, Cécile, Puthier, Denis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Methods Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693575/ https://www.ncbi.nlm.nih.gov/pubmed/34988437 http://dx.doi.org/10.1093/nargab/lqab114

_version_	1784619170213658624
author	Ferré, Quentin Capponi, Cécile Puthier, Denis
author_facet	Ferré, Quentin Capponi, Cécile Puthier, Denis
author_sort	Ferré, Quentin
collection	PubMed
description	Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the statistical significance of n-wise overlaps of genomic features is seldom tackled, which prevent rigorous studies of n-wise interactions. We introduce OLOGRAM-MODL, which considers overlaps between n ≥ 2 sets of genomic regions, and computes their statistical mutual enrichment by Monte Carlo fitting of a Negative Binomial distribution, resulting in more resolutive P-values. An optional machine learning method is proposed to find complexes of interest, using a new itemset mining algorithm based on dictionary learning which is resistant to noise inherent to biological assays. The overall approach is implemented through an easy-to-use CLI interface for workflow integration, and a visual tree-based representation of the results suited for explicability. The viability of the method is experimentally studied using both artificial and biological data. This approach is accessible through the command line interface of the pygtftk toolkit, available on Bioconda and from https://github.com/dputhier/pygtftk
format	Online Article Text
id	pubmed-8693575
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-86935752022-01-04 OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning Ferré, Quentin Capponi, Cécile Puthier, Denis NAR Genom Bioinform Methods Article Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the statistical significance of n-wise overlaps of genomic features is seldom tackled, which prevent rigorous studies of n-wise interactions. We introduce OLOGRAM-MODL, which considers overlaps between n ≥ 2 sets of genomic regions, and computes their statistical mutual enrichment by Monte Carlo fitting of a Negative Binomial distribution, resulting in more resolutive P-values. An optional machine learning method is proposed to find complexes of interest, using a new itemset mining algorithm based on dictionary learning which is resistant to noise inherent to biological assays. The overall approach is implemented through an easy-to-use CLI interface for workflow integration, and a visual tree-based representation of the results suited for explicability. The viability of the method is experimentally studied using both artificial and biological data. This approach is accessible through the command line interface of the pygtftk toolkit, available on Bioconda and from https://github.com/dputhier/pygtftk Oxford University Press 2021-12-22 /pmc/articles/PMC8693575/ /pubmed/34988437 http://dx.doi.org/10.1093/nargab/lqab114 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Article Ferré, Quentin Capponi, Cécile Puthier, Denis OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning
title	OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning
title_full	OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning
title_fullStr	OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning
title_full_unstemmed	OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning
title_short	OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning
title_sort	ologram-modl: mining enriched n-wise combinations of genomic features with monte carlo and dictionary learning
topic	Methods Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693575/ https://www.ncbi.nlm.nih.gov/pubmed/34988437 http://dx.doi.org/10.1093/nargab/lqab114
work_keys_str_mv	AT ferrequentin ologrammodlminingenrichednwisecombinationsofgenomicfeatureswithmontecarloanddictionarylearning AT capponicecile ologrammodlminingenrichednwisecombinationsofgenomicfeatureswithmontecarloanddictionarylearning AT puthierdenis ologrammodlminingenrichednwisecombinationsofgenomicfeatureswithmontecarloanddictionarylearning

OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning

Ejemplares similares