Cargando…

Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model

Transcriptional enhancers commonly work over long genomic distances to precisely regulate spatiotemporal gene expression patterns. Dissecting the promoters physically contacted by these distal regulatory elements is essential for understanding developmental processes as well as the role of disease-a...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Li, Hill, Matthew C., Wang, Jun, Wang, Jianxin, Martin, James F., Li, Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706734/
https://www.ncbi.nlm.nih.gov/pubmed/33184104
http://dx.doi.org/10.1101/gr.264606.120
_version_ 1783617211387084800
author Tang, Li
Hill, Matthew C.
Wang, Jun
Wang, Jianxin
Martin, James F.
Li, Min
author_facet Tang, Li
Hill, Matthew C.
Wang, Jun
Wang, Jianxin
Martin, James F.
Li, Min
author_sort Tang, Li
collection PubMed
description Transcriptional enhancers commonly work over long genomic distances to precisely regulate spatiotemporal gene expression patterns. Dissecting the promoters physically contacted by these distal regulatory elements is essential for understanding developmental processes as well as the role of disease-associated risk variants. Modern proximity-ligation assays, like HiChIP and ChIA-PET, facilitate the accurate identification of long-range contacts between enhancers and promoters. However, these assays are technically challenging, expensive, and time-consuming, making it difficult to investigate enhancer topologies, especially in uncharacterized cell types. To overcome these shortcomings, we therefore designed LoopPredictor, an ensemble machine learning model, to predict genome topology for cell types which lack long-range contact maps. To enrich for functional enhancer-promoter loops over common structural genomic contacts, we trained LoopPredictor with both H3K27ac and YY1 HiChIP data. Moreover, the integration of several related multi-omics features facilitated identifying and annotating the predicted loops. LoopPredictor is able to efficiently identify cell type–specific enhancer-mediated loops, and promoter–promoter interactions, with a modest feature input requirement. Comparable to experimentally generated H3K27ac HiChIP data, we found that LoopPredictor was able to identify functional enhancer loops. Furthermore, to explore the cross-species prediction capability of LoopPredictor, we fed mouse multi-omics features into a model trained on human data and found that the predicted enhancer loops outputs were highly conserved. LoopPredictor enables the dissection of cell type–specific long-range gene regulation and can accelerate the identification of distal disease-associated risk variants.
format Online
Article
Text
id pubmed-7706734
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-77067342021-06-01 Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model Tang, Li Hill, Matthew C. Wang, Jun Wang, Jianxin Martin, James F. Li, Min Genome Res Method Transcriptional enhancers commonly work over long genomic distances to precisely regulate spatiotemporal gene expression patterns. Dissecting the promoters physically contacted by these distal regulatory elements is essential for understanding developmental processes as well as the role of disease-associated risk variants. Modern proximity-ligation assays, like HiChIP and ChIA-PET, facilitate the accurate identification of long-range contacts between enhancers and promoters. However, these assays are technically challenging, expensive, and time-consuming, making it difficult to investigate enhancer topologies, especially in uncharacterized cell types. To overcome these shortcomings, we therefore designed LoopPredictor, an ensemble machine learning model, to predict genome topology for cell types which lack long-range contact maps. To enrich for functional enhancer-promoter loops over common structural genomic contacts, we trained LoopPredictor with both H3K27ac and YY1 HiChIP data. Moreover, the integration of several related multi-omics features facilitated identifying and annotating the predicted loops. LoopPredictor is able to efficiently identify cell type–specific enhancer-mediated loops, and promoter–promoter interactions, with a modest feature input requirement. Comparable to experimentally generated H3K27ac HiChIP data, we found that LoopPredictor was able to identify functional enhancer loops. Furthermore, to explore the cross-species prediction capability of LoopPredictor, we fed mouse multi-omics features into a model trained on human data and found that the predicted enhancer loops outputs were highly conserved. LoopPredictor enables the dissection of cell type–specific long-range gene regulation and can accelerate the identification of distal disease-associated risk variants. Cold Spring Harbor Laboratory Press 2020-12 /pmc/articles/PMC7706734/ /pubmed/33184104 http://dx.doi.org/10.1101/gr.264606.120 Text en © 2020 Tang et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Tang, Li
Hill, Matthew C.
Wang, Jun
Wang, Jianxin
Martin, James F.
Li, Min
Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
title Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
title_full Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
title_fullStr Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
title_full_unstemmed Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
title_short Predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
title_sort predicting unrecognized enhancer-mediated genome topology by an ensemble machine learning model
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706734/
https://www.ncbi.nlm.nih.gov/pubmed/33184104
http://dx.doi.org/10.1101/gr.264606.120
work_keys_str_mv AT tangli predictingunrecognizedenhancermediatedgenometopologybyanensemblemachinelearningmodel
AT hillmatthewc predictingunrecognizedenhancermediatedgenometopologybyanensemblemachinelearningmodel
AT wangjun predictingunrecognizedenhancermediatedgenometopologybyanensemblemachinelearningmodel
AT wangjianxin predictingunrecognizedenhancermediatedgenometopologybyanensemblemachinelearningmodel
AT martinjamesf predictingunrecognizedenhancermediatedgenometopologybyanensemblemachinelearningmodel
AT limin predictingunrecognizedenhancermediatedgenometopologybyanensemblemachinelearningmodel