Cargando…
Joint analysis of scATAC-seq datasets using epiConv
BACKGROUND: Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multip...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338487/ https://www.ncbi.nlm.nih.gov/pubmed/35906531 http://dx.doi.org/10.1186/s12859-022-04858-w |
_version_ | 1784759978264887296 |
---|---|
author | Lin, Li Zhang, Liye |
author_facet | Lin, Li Zhang, Liye |
author_sort | Lin, Li |
collection | PubMed |
description | BACKGROUND: Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multiple datasets, specialized method is required to remove technical variations between datasets while keep biological information. RESULTS: Here we present an algorithm named epiConv to perform joint analyses on scATAC-seq datasets. We first show that epiConv better corrects batch effects and is less prone to over-fitting problem than existing methods on a collection of PBMC datasets. In a collection of mouse brain data, we show that epiConv is capable of aligning low-depth scATAC-Seq from co-assay data (simultaneous profiling of transcriptome and chromatin) onto high-quality ATAC-seq reference and increasing the resolution of chromatin profiles of co-assay data. Finally, we show that epiConv can be used to integrate cells from different biological conditions (T cells in normal vs. germ-free mouse; normal vs. malignant hematopoiesis), which reveals hidden cell populations that would otherwise be undetectable. CONCLUSIONS: In this study, we introduce epiConv to integrate multiple scATAC-seq datasets and perform joint analysis on them. Through several case studies, we show that epiConv removes the batch effects and retains the biological signal. Moreover, joint analysis across multiple datasets improves the performance of clustering and differentially accessible peak calling, especially when the biological signal is weak in single dataset. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04858-w. |
format | Online Article Text |
id | pubmed-9338487 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-93384872022-07-31 Joint analysis of scATAC-seq datasets using epiConv Lin, Li Zhang, Liye BMC Bioinformatics Research BACKGROUND: Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multiple datasets, specialized method is required to remove technical variations between datasets while keep biological information. RESULTS: Here we present an algorithm named epiConv to perform joint analyses on scATAC-seq datasets. We first show that epiConv better corrects batch effects and is less prone to over-fitting problem than existing methods on a collection of PBMC datasets. In a collection of mouse brain data, we show that epiConv is capable of aligning low-depth scATAC-Seq from co-assay data (simultaneous profiling of transcriptome and chromatin) onto high-quality ATAC-seq reference and increasing the resolution of chromatin profiles of co-assay data. Finally, we show that epiConv can be used to integrate cells from different biological conditions (T cells in normal vs. germ-free mouse; normal vs. malignant hematopoiesis), which reveals hidden cell populations that would otherwise be undetectable. CONCLUSIONS: In this study, we introduce epiConv to integrate multiple scATAC-seq datasets and perform joint analysis on them. Through several case studies, we show that epiConv removes the batch effects and retains the biological signal. Moreover, joint analysis across multiple datasets improves the performance of clustering and differentially accessible peak calling, especially when the biological signal is weak in single dataset. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04858-w. BioMed Central 2022-07-29 /pmc/articles/PMC9338487/ /pubmed/35906531 http://dx.doi.org/10.1186/s12859-022-04858-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Lin, Li Zhang, Liye Joint analysis of scATAC-seq datasets using epiConv |
title | Joint analysis of scATAC-seq datasets using epiConv |
title_full | Joint analysis of scATAC-seq datasets using epiConv |
title_fullStr | Joint analysis of scATAC-seq datasets using epiConv |
title_full_unstemmed | Joint analysis of scATAC-seq datasets using epiConv |
title_short | Joint analysis of scATAC-seq datasets using epiConv |
title_sort | joint analysis of scatac-seq datasets using epiconv |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338487/ https://www.ncbi.nlm.nih.gov/pubmed/35906531 http://dx.doi.org/10.1186/s12859-022-04858-w |
work_keys_str_mv | AT linli jointanalysisofscatacseqdatasetsusingepiconv AT zhangliye jointanalysisofscatacseqdatasetsusingepiconv |