Cargando…

Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data

Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data a...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsu, Lauren L., Culhane, Aedín C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867729/
https://www.ncbi.nlm.nih.gov/pubmed/36681709
http://dx.doi.org/10.1038/s41598-022-26434-1
_version_ 1784876408520048640
author Hsu, Lauren L.
Culhane, Aedín C.
author_facet Hsu, Lauren L.
Culhane, Aedín C.
author_sort Hsu, Lauren L.
collection PubMed
description Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data and obscure meaningful variation. We describe correspondence analysis (CA), a count-based alternative to PCA. CA is based on decomposition of a chi-squared residual matrix, avoiding distortive log-transformation. To address overdispersion and high sparsity in scRNAseq data, we propose five adaptations of CA, which are fast, scalable, and outperform standard CA and glmPCA, to compute cell embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. In particular, we find that CA with Freeman–Tukey residuals performs especially well across diverse datasets. Other advantages of the CA framework include visualization of associations between genes and cell populations in a “CA biplot,” and extension to multi-table analysis; we introduce corralm for integrative multi-table dimension reduction of scRNAseq data. We implement CA for scRNAseq data in corral, an R/Bioconductor package which interfaces directly with single cell classes in Bioconductor. Switching from PCA to CA is achieved through a simple pipeline substitution and improves dimension reduction of scRNAseq datasets.
format Online
Article
Text
id pubmed-9867729
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-98677292023-01-23 Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data Hsu, Lauren L. Culhane, Aedín C. Sci Rep Article Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data and obscure meaningful variation. We describe correspondence analysis (CA), a count-based alternative to PCA. CA is based on decomposition of a chi-squared residual matrix, avoiding distortive log-transformation. To address overdispersion and high sparsity in scRNAseq data, we propose five adaptations of CA, which are fast, scalable, and outperform standard CA and glmPCA, to compute cell embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. In particular, we find that CA with Freeman–Tukey residuals performs especially well across diverse datasets. Other advantages of the CA framework include visualization of associations between genes and cell populations in a “CA biplot,” and extension to multi-table analysis; we introduce corralm for integrative multi-table dimension reduction of scRNAseq data. We implement CA for scRNAseq data in corral, an R/Bioconductor package which interfaces directly with single cell classes in Bioconductor. Switching from PCA to CA is achieved through a simple pipeline substitution and improves dimension reduction of scRNAseq datasets. Nature Publishing Group UK 2023-01-21 /pmc/articles/PMC9867729/ /pubmed/36681709 http://dx.doi.org/10.1038/s41598-022-26434-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Hsu, Lauren L.
Culhane, Aedín C.
Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
title Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
title_full Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
title_fullStr Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
title_full_unstemmed Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
title_short Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
title_sort correspondence analysis for dimension reduction, batch integration, and visualization of single-cell rna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867729/
https://www.ncbi.nlm.nih.gov/pubmed/36681709
http://dx.doi.org/10.1038/s41598-022-26434-1
work_keys_str_mv AT hsulaurenl correspondenceanalysisfordimensionreductionbatchintegrationandvisualizationofsinglecellrnaseqdata
AT culhaneaedinc correspondenceanalysisfordimensionreductionbatchintegrationandvisualizationofsinglecellrnaseqdata