Cargando…

SCIBER: a simple method for removing batch effects from single-cell RNA-sequencing data

MOTIVATION: Integrative analysis of multiple single-cell RNA-sequencing datasets allows for more comprehensive characterizations of cell types, but systematic technical differences between datasets, known as ‘batch effects’, need to be removed before integration to avoid misleading interpretation of...

Descripción completa

Detalles Bibliográficos
Autores principales: Gan, Dailin, Li, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9848058/
https://www.ncbi.nlm.nih.gov/pubmed/36548380
http://dx.doi.org/10.1093/bioinformatics/btac819
Descripción
Sumario:MOTIVATION: Integrative analysis of multiple single-cell RNA-sequencing datasets allows for more comprehensive characterizations of cell types, but systematic technical differences between datasets, known as ‘batch effects’, need to be removed before integration to avoid misleading interpretation of the data. Although many batch-effect-removal methods have been developed, there is still a large room for improvement: most existing methods only give dimension-reduced data instead of expression data of individual genes, are based on computationally demanding models and are black-box models and thus difficult to interpret or tune. RESULTS: Here, we present a new batch-effect-removal method called SCIBER (Single-Cell Integrator and Batch Effect Remover) and study its performance on real datasets. SCIBER matches cell clusters across batches according to the overlap of their differentially expressed genes. As a simple algorithm that has better scalability to data with a large number of cells and is easy to tune, SCIBER shows comparable and sometimes better accuracy in removing batch effects on real datasets compared to the state-of-the-art methods, which are much more complicated. Moreover, SCIBER outputs expression data in the original space, that is, the expression of individual genes, which can be used directly for downstream analyses. Additionally, SCIBER is a reference-based method, which assigns one of the batches as the reference batch and keeps it untouched during the process, making it especially suitable for integrating user-generated datasets with standard reference data such as the Human Cell Atlas. AVAILABILITY AND IMPLEMENTATION: SCIBER is publicly available as an R package on CRAN: https://cran.r-project.org/web/packages/SCIBER/. A vignette is included in the CRAN R package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.