Cargando…

Signal recovery in single cell batch integration

Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are sepa...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhaojun, Mathew, Divij, Lim, Tristan, Mason, Kaishu, Martinez, Clara Morral, Huang, Sijia, Wherry, E. John, Susztak, Katalin, Minn, Andy J., Ma, Zongming, Zhang, Nancy R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197537/
https://www.ncbi.nlm.nih.gov/pubmed/37215021
http://dx.doi.org/10.1101/2023.05.05.539614
_version_ 1785044572152266752
author Zhang, Zhaojun
Mathew, Divij
Lim, Tristan
Mason, Kaishu
Martinez, Clara Morral
Huang, Sijia
Wherry, E. John
Susztak, Katalin
Minn, Andy J.
Ma, Zongming
Zhang, Nancy R.
author_facet Zhang, Zhaojun
Mathew, Divij
Lim, Tristan
Mason, Kaishu
Martinez, Clara Morral
Huang, Sijia
Wherry, E. John
Susztak, Katalin
Minn, Andy J.
Ma, Zongming
Zhang, Nancy R.
author_sort Zhang, Zhaojun
collection PubMed
description Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be “appropriately” mixed, while preserving “main cell type clusters”. We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a “pool-of-controls” design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study.
format Online
Article
Text
id pubmed-10197537
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101975372023-05-20 Signal recovery in single cell batch integration Zhang, Zhaojun Mathew, Divij Lim, Tristan Mason, Kaishu Martinez, Clara Morral Huang, Sijia Wherry, E. John Susztak, Katalin Minn, Andy J. Ma, Zongming Zhang, Nancy R. bioRxiv Article Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be “appropriately” mixed, while preserving “main cell type clusters”. We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a “pool-of-controls” design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study. Cold Spring Harbor Laboratory 2023-09-23 /pmc/articles/PMC10197537/ /pubmed/37215021 http://dx.doi.org/10.1101/2023.05.05.539614 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Zhang, Zhaojun
Mathew, Divij
Lim, Tristan
Mason, Kaishu
Martinez, Clara Morral
Huang, Sijia
Wherry, E. John
Susztak, Katalin
Minn, Andy J.
Ma, Zongming
Zhang, Nancy R.
Signal recovery in single cell batch integration
title Signal recovery in single cell batch integration
title_full Signal recovery in single cell batch integration
title_fullStr Signal recovery in single cell batch integration
title_full_unstemmed Signal recovery in single cell batch integration
title_short Signal recovery in single cell batch integration
title_sort signal recovery in single cell batch integration
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197537/
https://www.ncbi.nlm.nih.gov/pubmed/37215021
http://dx.doi.org/10.1101/2023.05.05.539614
work_keys_str_mv AT zhangzhaojun signalrecoveryinsinglecellbatchintegration
AT mathewdivij signalrecoveryinsinglecellbatchintegration
AT limtristan signalrecoveryinsinglecellbatchintegration
AT masonkaishu signalrecoveryinsinglecellbatchintegration
AT martinezclaramorral signalrecoveryinsinglecellbatchintegration
AT huangsijia signalrecoveryinsinglecellbatchintegration
AT wherryejohn signalrecoveryinsinglecellbatchintegration
AT susztakkatalin signalrecoveryinsinglecellbatchintegration
AT minnandyj signalrecoveryinsinglecellbatchintegration
AT mazongming signalrecoveryinsinglecellbatchintegration
AT zhangnancyr signalrecoveryinsinglecellbatchintegration