Cargando…
Signal recovery in single cell batch integration
Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are sepa...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197537/ https://www.ncbi.nlm.nih.gov/pubmed/37215021 http://dx.doi.org/10.1101/2023.05.05.539614 |
_version_ | 1785044572152266752 |
---|---|
author | Zhang, Zhaojun Mathew, Divij Lim, Tristan Mason, Kaishu Martinez, Clara Morral Huang, Sijia Wherry, E. John Susztak, Katalin Minn, Andy J. Ma, Zongming Zhang, Nancy R. |
author_facet | Zhang, Zhaojun Mathew, Divij Lim, Tristan Mason, Kaishu Martinez, Clara Morral Huang, Sijia Wherry, E. John Susztak, Katalin Minn, Andy J. Ma, Zongming Zhang, Nancy R. |
author_sort | Zhang, Zhaojun |
collection | PubMed |
description | Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be “appropriately” mixed, while preserving “main cell type clusters”. We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a “pool-of-controls” design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study. |
format | Online Article Text |
id | pubmed-10197537 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-101975372023-05-20 Signal recovery in single cell batch integration Zhang, Zhaojun Mathew, Divij Lim, Tristan Mason, Kaishu Martinez, Clara Morral Huang, Sijia Wherry, E. John Susztak, Katalin Minn, Andy J. Ma, Zongming Zhang, Nancy R. bioRxiv Article Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be “appropriately” mixed, while preserving “main cell type clusters”. We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a “pool-of-controls” design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study. Cold Spring Harbor Laboratory 2023-09-23 /pmc/articles/PMC10197537/ /pubmed/37215021 http://dx.doi.org/10.1101/2023.05.05.539614 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Zhang, Zhaojun Mathew, Divij Lim, Tristan Mason, Kaishu Martinez, Clara Morral Huang, Sijia Wherry, E. John Susztak, Katalin Minn, Andy J. Ma, Zongming Zhang, Nancy R. Signal recovery in single cell batch integration |
title | Signal recovery in single cell batch integration |
title_full | Signal recovery in single cell batch integration |
title_fullStr | Signal recovery in single cell batch integration |
title_full_unstemmed | Signal recovery in single cell batch integration |
title_short | Signal recovery in single cell batch integration |
title_sort | signal recovery in single cell batch integration |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197537/ https://www.ncbi.nlm.nih.gov/pubmed/37215021 http://dx.doi.org/10.1101/2023.05.05.539614 |
work_keys_str_mv | AT zhangzhaojun signalrecoveryinsinglecellbatchintegration AT mathewdivij signalrecoveryinsinglecellbatchintegration AT limtristan signalrecoveryinsinglecellbatchintegration AT masonkaishu signalrecoveryinsinglecellbatchintegration AT martinezclaramorral signalrecoveryinsinglecellbatchintegration AT huangsijia signalrecoveryinsinglecellbatchintegration AT wherryejohn signalrecoveryinsinglecellbatchintegration AT susztakkatalin signalrecoveryinsinglecellbatchintegration AT minnandyj signalrecoveryinsinglecellbatchintegration AT mazongming signalrecoveryinsinglecellbatchintegration AT zhangnancyr signalrecoveryinsinglecellbatchintegration |