Cargando…

Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data

BACKGROUND: Revealing the gene targets of distal regulatory elements is challenging yet critical for interpreting regulome data. Experiment-derived enhancer-gene links are restricted to a small set of enhancers and/or cell types, while the accuracy of genome-wide approaches remains elusive due to th...

Descripción completa

Detalles Bibliográficos
Autores principales: Qin, Tingting, Lee, Christopher, Li, Shiting, Cavalcante, Raymond G., Orchard, Peter, Yao, Heming, Zhang, Hanrui, Wang, Shuze, Patil, Snehal, Boyle, Alan P., Sartor, Maureen A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044877/
https://www.ncbi.nlm.nih.gov/pubmed/35473573
http://dx.doi.org/10.1186/s13059-022-02668-0
Descripción
Sumario:BACKGROUND: Revealing the gene targets of distal regulatory elements is challenging yet critical for interpreting regulome data. Experiment-derived enhancer-gene links are restricted to a small set of enhancers and/or cell types, while the accuracy of genome-wide approaches remains elusive due to the lack of a systematic evaluation. We combined multiple spatial and in silico approaches for defining enhancer locations and linking them to their target genes aggregated across >500 cell types, generating 1860 human genome-wide distal enhancer-to-target gene definitions (EnTDefs). To evaluate performance, we used gene set enrichment (GSE) testing on 87 independent ENCODE ChIP-seq datasets of 34 transcription factors (TFs) and assessed concordance of results with known TF Gene Ontology annotations, and other benchmarks. RESULTS: The top ranked 741 (40%) EnTDefs significantly outperform the common, naïve approach of linking distal regions to the nearest genes, and the top 10 EnTDefs perform well when applied to ChIP-seq data of other cell types. The GSE-based ranking of EnTDefs is highly concordant with ranking based on overlap with curated benchmarks of enhancer-gene interactions. Both our top general EnTDef and cell-type-specific EnTDefs significantly outperform seven independent computational and experiment-based enhancer-gene pair datasets. We show that using our top EnTDefs for GSE with either genome-wide DNA methylation or ATAC-seq data is able to better recapitulate the biological processes changed in gene expression data performed in parallel for the same experiment than our lower-ranked EnTDefs. CONCLUSIONS: Our findings illustrate the power of our approach to provide genome-wide interpretation regardless of cell type. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02668-0.