Cargando…

simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data

MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Chen, Chen, Xiaoyang, Chen, Shengquan, Jiang, Rui, Zhang, Xuegong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10394124/
https://www.ncbi.nlm.nih.gov/pubmed/37494428
http://dx.doi.org/10.1093/bioinformatics/btad453
_version_ 1785083297241497600
author Li, Chen
Chen, Xiaoyang
Chen, Shengquan
Jiang, Rui
Zhang, Xuegong
author_facet Li, Chen
Chen, Xiaoyang
Chen, Shengquan
Jiang, Rui
Zhang, Xuegong
author_sort Li, Chen
collection PubMed
description MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS.
format Online
Article
Text
id pubmed-10394124
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103941242023-08-03 simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data Li, Chen Chen, Xiaoyang Chen, Shengquan Jiang, Rui Zhang, Xuegong Bioinformatics Original Paper MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS. Oxford University Press 2023-07-26 /pmc/articles/PMC10394124/ /pubmed/37494428 http://dx.doi.org/10.1093/bioinformatics/btad453 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Li, Chen
Chen, Xiaoyang
Chen, Shengquan
Jiang, Rui
Zhang, Xuegong
simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
title simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
title_full simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
title_fullStr simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
title_full_unstemmed simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
title_short simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
title_sort simcas: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10394124/
https://www.ncbi.nlm.nih.gov/pubmed/37494428
http://dx.doi.org/10.1093/bioinformatics/btad453
work_keys_str_mv AT lichen simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata
AT chenxiaoyang simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata
AT chenshengquan simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata
AT jiangrui simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata
AT zhangxuegong simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata