Cargando…
Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data
Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867729/ https://www.ncbi.nlm.nih.gov/pubmed/36681709 http://dx.doi.org/10.1038/s41598-022-26434-1 |
_version_ | 1784876408520048640 |
---|---|
author | Hsu, Lauren L. Culhane, Aedín C. |
author_facet | Hsu, Lauren L. Culhane, Aedín C. |
author_sort | Hsu, Lauren L. |
collection | PubMed |
description | Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data and obscure meaningful variation. We describe correspondence analysis (CA), a count-based alternative to PCA. CA is based on decomposition of a chi-squared residual matrix, avoiding distortive log-transformation. To address overdispersion and high sparsity in scRNAseq data, we propose five adaptations of CA, which are fast, scalable, and outperform standard CA and glmPCA, to compute cell embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. In particular, we find that CA with Freeman–Tukey residuals performs especially well across diverse datasets. Other advantages of the CA framework include visualization of associations between genes and cell populations in a “CA biplot,” and extension to multi-table analysis; we introduce corralm for integrative multi-table dimension reduction of scRNAseq data. We implement CA for scRNAseq data in corral, an R/Bioconductor package which interfaces directly with single cell classes in Bioconductor. Switching from PCA to CA is achieved through a simple pipeline substitution and improves dimension reduction of scRNAseq datasets. |
format | Online Article Text |
id | pubmed-9867729 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-98677292023-01-23 Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data Hsu, Lauren L. Culhane, Aedín C. Sci Rep Article Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data and obscure meaningful variation. We describe correspondence analysis (CA), a count-based alternative to PCA. CA is based on decomposition of a chi-squared residual matrix, avoiding distortive log-transformation. To address overdispersion and high sparsity in scRNAseq data, we propose five adaptations of CA, which are fast, scalable, and outperform standard CA and glmPCA, to compute cell embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. In particular, we find that CA with Freeman–Tukey residuals performs especially well across diverse datasets. Other advantages of the CA framework include visualization of associations between genes and cell populations in a “CA biplot,” and extension to multi-table analysis; we introduce corralm for integrative multi-table dimension reduction of scRNAseq data. We implement CA for scRNAseq data in corral, an R/Bioconductor package which interfaces directly with single cell classes in Bioconductor. Switching from PCA to CA is achieved through a simple pipeline substitution and improves dimension reduction of scRNAseq datasets. Nature Publishing Group UK 2023-01-21 /pmc/articles/PMC9867729/ /pubmed/36681709 http://dx.doi.org/10.1038/s41598-022-26434-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Hsu, Lauren L. Culhane, Aedín C. Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data |
title | Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data |
title_full | Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data |
title_fullStr | Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data |
title_full_unstemmed | Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data |
title_short | Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data |
title_sort | correspondence analysis for dimension reduction, batch integration, and visualization of single-cell rna-seq data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867729/ https://www.ncbi.nlm.nih.gov/pubmed/36681709 http://dx.doi.org/10.1038/s41598-022-26434-1 |
work_keys_str_mv | AT hsulaurenl correspondenceanalysisfordimensionreductionbatchintegrationandvisualizationofsinglecellrnaseqdata AT culhaneaedinc correspondenceanalysisfordimensionreductionbatchintegrationandvisualizationofsinglecellrnaseqdata |