Cargando…

Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes

Epigenomic profiling, including ATACseq, is one of the main tools used to define enhancers. Because enhancers are overwhelmingly cell-type specific, inference of their activity is greatly limited in complex tissues. Multiomic assays that probe in the same nucleus both the open chromatin landscape an...

Descripción completa

Detalles Bibliográficos
Autores principales: Leblanc, Francis J. A., Lettre, Guillaume
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998442/
https://www.ncbi.nlm.nih.gov/pubmed/36894706
http://dx.doi.org/10.1038/s41598-023-31040-w
_version_ 1784903464231370752
author Leblanc, Francis J. A.
Lettre, Guillaume
author_facet Leblanc, Francis J. A.
Lettre, Guillaume
author_sort Leblanc, Francis J. A.
collection PubMed
description Epigenomic profiling, including ATACseq, is one of the main tools used to define enhancers. Because enhancers are overwhelmingly cell-type specific, inference of their activity is greatly limited in complex tissues. Multiomic assays that probe in the same nucleus both the open chromatin landscape and gene expression levels enable the study of correlations (links) between these two modalities. Current best practices to infer the regulatory effect of candidate cis-regulatory elements (cCREs) in multiomic data involve removing biases associated with GC content by generating null distributions of matched ATACseq peaks drawn from different chromosomes. This strategy has been broadly adopted by popular single-nucleus multiomic workflows such as Signac. Here, we uncovered limitations and confounders of this approach. We found a strong loss of power to detect a regulatory effect for cCREs with high read counts in the dominant cell-type. We showed that this is largely due to cell-type-specific trans-ATACseq peak correlations creating bimodal null distributions. We tested alternative models and concluded that physical distance and/or the raw Pearson correlation coefficients are the best predictors for peak-gene links when compared to predictions from Epimap (e.g. CD14 area under the curve [AUC] = 0.51 with the method implemented in Signac vs. 0.71 with the Pearson correlation coefficients) or validation by CRISPR perturbations (AUC = 0.63 vs. 0.73).
format Online
Article
Text
id pubmed-9998442
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99984422023-03-11 Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes Leblanc, Francis J. A. Lettre, Guillaume Sci Rep Article Epigenomic profiling, including ATACseq, is one of the main tools used to define enhancers. Because enhancers are overwhelmingly cell-type specific, inference of their activity is greatly limited in complex tissues. Multiomic assays that probe in the same nucleus both the open chromatin landscape and gene expression levels enable the study of correlations (links) between these two modalities. Current best practices to infer the regulatory effect of candidate cis-regulatory elements (cCREs) in multiomic data involve removing biases associated with GC content by generating null distributions of matched ATACseq peaks drawn from different chromosomes. This strategy has been broadly adopted by popular single-nucleus multiomic workflows such as Signac. Here, we uncovered limitations and confounders of this approach. We found a strong loss of power to detect a regulatory effect for cCREs with high read counts in the dominant cell-type. We showed that this is largely due to cell-type-specific trans-ATACseq peak correlations creating bimodal null distributions. We tested alternative models and concluded that physical distance and/or the raw Pearson correlation coefficients are the best predictors for peak-gene links when compared to predictions from Epimap (e.g. CD14 area under the curve [AUC] = 0.51 with the method implemented in Signac vs. 0.71 with the Pearson correlation coefficients) or validation by CRISPR perturbations (AUC = 0.63 vs. 0.73). Nature Publishing Group UK 2023-03-09 /pmc/articles/PMC9998442/ /pubmed/36894706 http://dx.doi.org/10.1038/s41598-023-31040-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Leblanc, Francis J. A.
Lettre, Guillaume
Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
title Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
title_full Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
title_fullStr Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
title_full_unstemmed Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
title_short Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
title_sort major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998442/
https://www.ncbi.nlm.nih.gov/pubmed/36894706
http://dx.doi.org/10.1038/s41598-023-31040-w
work_keys_str_mv AT leblancfrancisja majorcelltypesinmultiomicsinglenucleusdatasetsimpactstatisticalmodelingoflinksbetweenregulatorysequencesandtargetgenes
AT lettreguillaume majorcelltypesinmultiomicsinglenucleusdatasetsimpactstatisticalmodelingoflinksbetweenregulatorysequencesandtargetgenes