Cargando…
Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes
Epigenomic profiling, including ATACseq, is one of the main tools used to define enhancers. Because enhancers are overwhelmingly cell-type specific, inference of their activity is greatly limited in complex tissues. Multiomic assays that probe in the same nucleus both the open chromatin landscape an...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998442/ https://www.ncbi.nlm.nih.gov/pubmed/36894706 http://dx.doi.org/10.1038/s41598-023-31040-w |
_version_ | 1784903464231370752 |
---|---|
author | Leblanc, Francis J. A. Lettre, Guillaume |
author_facet | Leblanc, Francis J. A. Lettre, Guillaume |
author_sort | Leblanc, Francis J. A. |
collection | PubMed |
description | Epigenomic profiling, including ATACseq, is one of the main tools used to define enhancers. Because enhancers are overwhelmingly cell-type specific, inference of their activity is greatly limited in complex tissues. Multiomic assays that probe in the same nucleus both the open chromatin landscape and gene expression levels enable the study of correlations (links) between these two modalities. Current best practices to infer the regulatory effect of candidate cis-regulatory elements (cCREs) in multiomic data involve removing biases associated with GC content by generating null distributions of matched ATACseq peaks drawn from different chromosomes. This strategy has been broadly adopted by popular single-nucleus multiomic workflows such as Signac. Here, we uncovered limitations and confounders of this approach. We found a strong loss of power to detect a regulatory effect for cCREs with high read counts in the dominant cell-type. We showed that this is largely due to cell-type-specific trans-ATACseq peak correlations creating bimodal null distributions. We tested alternative models and concluded that physical distance and/or the raw Pearson correlation coefficients are the best predictors for peak-gene links when compared to predictions from Epimap (e.g. CD14 area under the curve [AUC] = 0.51 with the method implemented in Signac vs. 0.71 with the Pearson correlation coefficients) or validation by CRISPR perturbations (AUC = 0.63 vs. 0.73). |
format | Online Article Text |
id | pubmed-9998442 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99984422023-03-11 Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes Leblanc, Francis J. A. Lettre, Guillaume Sci Rep Article Epigenomic profiling, including ATACseq, is one of the main tools used to define enhancers. Because enhancers are overwhelmingly cell-type specific, inference of their activity is greatly limited in complex tissues. Multiomic assays that probe in the same nucleus both the open chromatin landscape and gene expression levels enable the study of correlations (links) between these two modalities. Current best practices to infer the regulatory effect of candidate cis-regulatory elements (cCREs) in multiomic data involve removing biases associated with GC content by generating null distributions of matched ATACseq peaks drawn from different chromosomes. This strategy has been broadly adopted by popular single-nucleus multiomic workflows such as Signac. Here, we uncovered limitations and confounders of this approach. We found a strong loss of power to detect a regulatory effect for cCREs with high read counts in the dominant cell-type. We showed that this is largely due to cell-type-specific trans-ATACseq peak correlations creating bimodal null distributions. We tested alternative models and concluded that physical distance and/or the raw Pearson correlation coefficients are the best predictors for peak-gene links when compared to predictions from Epimap (e.g. CD14 area under the curve [AUC] = 0.51 with the method implemented in Signac vs. 0.71 with the Pearson correlation coefficients) or validation by CRISPR perturbations (AUC = 0.63 vs. 0.73). Nature Publishing Group UK 2023-03-09 /pmc/articles/PMC9998442/ /pubmed/36894706 http://dx.doi.org/10.1038/s41598-023-31040-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Leblanc, Francis J. A. Lettre, Guillaume Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
title | Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
title_full | Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
title_fullStr | Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
title_full_unstemmed | Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
title_short | Major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
title_sort | major cell-types in multiomic single-nucleus datasets impact statistical modeling of links between regulatory sequences and target genes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998442/ https://www.ncbi.nlm.nih.gov/pubmed/36894706 http://dx.doi.org/10.1038/s41598-023-31040-w |
work_keys_str_mv | AT leblancfrancisja majorcelltypesinmultiomicsinglenucleusdatasetsimpactstatisticalmodelingoflinksbetweenregulatorysequencesandtargetgenes AT lettreguillaume majorcelltypesinmultiomicsinglenucleusdatasetsimpactstatisticalmodelingoflinksbetweenregulatorysequencesandtargetgenes |