Cargando…

A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing

Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cel...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Wenan, Zhang, Silu, Williams, Justin, Ju, Bensheng, Shaner, Bridget, Easton, John, Wu, Gang, Chen, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7163294/
https://www.ncbi.nlm.nih.gov/pubmed/32322368
http://dx.doi.org/10.1016/j.csbj.2020.03.026
_version_ 1783523186203164672
author Chen, Wenan
Zhang, Silu
Williams, Justin
Ju, Bensheng
Shaner, Bridget
Easton, John
Wu, Gang
Chen, Xiang
author_facet Chen, Wenan
Zhang, Silu
Williams, Justin
Ju, Bensheng
Shaner, Bridget
Easton, John
Wu, Gang
Chen, Xiang
author_sort Chen, Wenan
collection PubMed
description Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate 11 methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using a batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except that of SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite an occasionally inflated FDR (up to 0.2). Finally we make the following recommendations for scRNA-seq DE analysis: 1) incorporate known batch variables instead of using batch-corrected data; and 2) employ SVA for latent batch correction. However, better methods are still needed to fully unleash the power of scRNA-seq.
format Online
Article
Text
id pubmed-7163294
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-71632942020-04-22 A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing Chen, Wenan Zhang, Silu Williams, Justin Ju, Bensheng Shaner, Bridget Easton, John Wu, Gang Chen, Xiang Comput Struct Biotechnol J Research Article Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate 11 methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using a batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except that of SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite an occasionally inflated FDR (up to 0.2). Finally we make the following recommendations for scRNA-seq DE analysis: 1) incorporate known batch variables instead of using batch-corrected data; and 2) employ SVA for latent batch correction. However, better methods are still needed to fully unleash the power of scRNA-seq. Research Network of Computational and Structural Biotechnology 2020-03-30 /pmc/articles/PMC7163294/ /pubmed/32322368 http://dx.doi.org/10.1016/j.csbj.2020.03.026 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Chen, Wenan
Zhang, Silu
Williams, Justin
Ju, Bensheng
Shaner, Bridget
Easton, John
Wu, Gang
Chen, Xiang
A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing
title A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing
title_full A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing
title_fullStr A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing
title_full_unstemmed A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing
title_short A comparison of methods accounting for batch effects in differential expression analysis of UMI count based single cell RNA sequencing
title_sort comparison of methods accounting for batch effects in differential expression analysis of umi count based single cell rna sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7163294/
https://www.ncbi.nlm.nih.gov/pubmed/32322368
http://dx.doi.org/10.1016/j.csbj.2020.03.026
work_keys_str_mv AT chenwenan acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT zhangsilu acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT williamsjustin acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT jubensheng acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT shanerbridget acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT eastonjohn acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT wugang acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT chenxiang acomparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT chenwenan comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT zhangsilu comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT williamsjustin comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT jubensheng comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT shanerbridget comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT eastonjohn comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT wugang comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing
AT chenxiang comparisonofmethodsaccountingforbatcheffectsindifferentialexpressionanalysisofumicountbasedsinglecellrnasequencing