Cargando…

scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data

Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces...

Descripción completa

Detalles Bibliográficos
Autores principales: Gong, Yanwen, Srinivasan, Shushrruth Sai, Zhang, Ruiyi, Kessenbrock, Kai, Zhang, Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9312957/
https://www.ncbi.nlm.nih.gov/pubmed/35883430
http://dx.doi.org/10.3390/biom12070874
_version_ 1784753960599420928
author Gong, Yanwen
Srinivasan, Shushrruth Sai
Zhang, Ruiyi
Kessenbrock, Kai
Zhang, Jing
author_facet Gong, Yanwen
Srinivasan, Shushrruth Sai
Zhang, Ruiyi
Kessenbrock, Kai
Zhang, Jing
author_sort Gong, Yanwen
collection PubMed
description Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock’s object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer’s disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies.
format Online
Article
Text
id pubmed-9312957
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93129572022-07-26 scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data Gong, Yanwen Srinivasan, Shushrruth Sai Zhang, Ruiyi Kessenbrock, Kai Zhang, Jing Biomolecules Article Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock’s object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer’s disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies. MDPI 2022-06-23 /pmc/articles/PMC9312957/ /pubmed/35883430 http://dx.doi.org/10.3390/biom12070874 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gong, Yanwen
Srinivasan, Shushrruth Sai
Zhang, Ruiyi
Kessenbrock, Kai
Zhang, Jing
scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
title scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
title_full scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
title_fullStr scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
title_full_unstemmed scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
title_short scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data
title_sort scepilock: a weakly supervised learning framework for cis-regulatory element localization and variant impact quantification for single-cell epigenetic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9312957/
https://www.ncbi.nlm.nih.gov/pubmed/35883430
http://dx.doi.org/10.3390/biom12070874
work_keys_str_mv AT gongyanwen scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata
AT srinivasanshushrruthsai scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata
AT zhangruiyi scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata
AT kessenbrockkai scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata
AT zhangjing scepilockaweaklysupervisedlearningframeworkforcisregulatoryelementlocalizationandvariantimpactquantificationforsinglecellepigeneticdata