Cargando…
simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data
MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10394124/ https://www.ncbi.nlm.nih.gov/pubmed/37494428 http://dx.doi.org/10.1093/bioinformatics/btad453 |
_version_ | 1785083297241497600 |
---|---|
author | Li, Chen Chen, Xiaoyang Chen, Shengquan Jiang, Rui Zhang, Xuegong |
author_facet | Li, Chen Chen, Xiaoyang Chen, Shengquan Jiang, Rui Zhang, Xuegong |
author_sort | Li, Chen |
collection | PubMed |
description | MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS. |
format | Online Article Text |
id | pubmed-10394124 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103941242023-08-03 simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data Li, Chen Chen, Xiaoyang Chen, Shengquan Jiang, Rui Zhang, Xuegong Bioinformatics Original Paper MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS. Oxford University Press 2023-07-26 /pmc/articles/PMC10394124/ /pubmed/37494428 http://dx.doi.org/10.1093/bioinformatics/btad453 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Li, Chen Chen, Xiaoyang Chen, Shengquan Jiang, Rui Zhang, Xuegong simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
title | simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
title_full | simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
title_fullStr | simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
title_full_unstemmed | simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
title_short | simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
title_sort | simcas: an embedding-based method for simulating single-cell chromatin accessibility sequencing data |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10394124/ https://www.ncbi.nlm.nih.gov/pubmed/37494428 http://dx.doi.org/10.1093/bioinformatics/btad453 |
work_keys_str_mv | AT lichen simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata AT chenxiaoyang simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata AT chenshengquan simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata AT jiangrui simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata AT zhangxuegong simcasanembeddingbasedmethodforsimulatingsinglecellchromatinaccessibilitysequencingdata |