Cargando…
scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9312957/ https://www.ncbi.nlm.nih.gov/pubmed/35883430 http://dx.doi.org/10.3390/biom12070874 |
_version_ | 1784753960599420928 |
---|---|
author | Gong, Yanwen Srinivasan, Shushrruth Sai Zhang, Ruiyi Kessenbrock, Kai Zhang, Jing |
author_facet | Gong, Yanwen Srinivasan, Shushrruth Sai Zhang, Ruiyi Kessenbrock, Kai Zhang, Jing |
author_sort | Gong, Yanwen |
collection | PubMed |
description | Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock’s object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer’s disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies. |
format | Online Article Text |
id | pubmed-9312957 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93129572022-07-26 scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data Gong, Yanwen Srinivasan, Shushrruth Sai Zhang, Ruiyi Kessenbrock, Kai Zhang, Jing Biomolecules Article Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock’s object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer’s disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies. MDPI 2022-06-23 /pmc/articles/PMC9312957/ /pubmed/35883430 http://dx.doi.org/10.3390/biom12070874 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Gong, Yanwen Srinivasan, Shushrruth Sai Zhang, Ruiyi Kessenbrock, Kai Zhang, Jing scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data |
title | scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data |
title_full | scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data |
title_fullStr | scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data |
title_full_unstemmed | scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data |
title_short | scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data |
title_sort | scepilock: a weakly supervised learning framework for cis-regulatory element localization and variant impact quantification for single-cell epigenetic data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9312957/ https://www.ncbi.nlm.nih.gov/pubmed/35883430 http://dx.doi.org/10.3390/biom12070874 |
work_keys_str_mv | AT gongyanwen scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata AT srinivasanshushrruthsai scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata AT zhangruiyi scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata AT kessenbrockkai scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata AT zhangjing scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata |