Cargando…

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models

Dimensionality reduction is a critical step in the analysis of single-cell RNA-seq data. The standard approach is to apply a transformation to the count matrix, followed by principal components analysis. However, this approach can spuriously indicate heterogeneity where it does not exist and mask tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Nicol, Phillip B., Miller, Jeffrey W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168202/
https://www.ncbi.nlm.nih.gov/pubmed/37162914
http://dx.doi.org/10.1101/2023.04.21.537881
Descripción
Sumario:Dimensionality reduction is a critical step in the analysis of single-cell RNA-seq data. The standard approach is to apply a transformation to the count matrix, followed by principal components analysis. However, this approach can spuriously indicate heterogeneity where it does not exist and mask true heterogeneity where it does exist. An alternative approach is to directly model the counts, but existing model-based methods tend to be computationally intractable on large datasets and do not quantify uncertainty in the low-dimensional representation. To address these problems, we develop scGBM, a novel method for model-based dimensionality reduction of single-cell RNA-seq data. scGBM employs a scalable algorithm to fit a Poisson bilinear model to datasets with millions of cells and quantifies the uncertainty in each cell’s latent position. Furthermore, scGBM leverages these uncertainties to assess the confidence associated with a given cell clustering. On real and simulated single-cell data, we find that scGBM produces low-dimensional embeddings that better capture relevant biological information while removing unwanted variation. scGBM is publicly available as an R package.