Cargando…

Supervised dimensionality reduction for exploration of single-cell data by HSS-LDA

Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction captures the structure and heterogeneity of the original dataset, creating low-dimensional visualizations that contribute to the human understanding of data. Existing algori...

Descripción completa

Detalles Bibliográficos
Autores principales: Amouzgar, Meelad, Glass, David R., Baskar, Reema, Averbukh, Inna, Kimmey, Samuel C., Tsai, Albert G., Hartmann, Felix J., Bendall, Sean C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9403402/
https://www.ncbi.nlm.nih.gov/pubmed/36033591
http://dx.doi.org/10.1016/j.patter.2022.100536
Descripción
Sumario:Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction captures the structure and heterogeneity of the original dataset, creating low-dimensional visualizations that contribute to the human understanding of data. Existing algorithms are typically unsupervised, using measured features to generate manifolds, disregarding known biological labels such as cell type or experimental time point. We repurpose the classification algorithm, linear discriminant analysis (LDA), for supervised dimensionality reduction of single-cell data. LDA identifies linear combinations of predictors that optimally separate a priori classes, enabling the study of specific aspects of cellular heterogeneity. We implement feature selection by hybrid subset selection (HSS) and demonstrate that this computationally efficient approach generates non-stochastic, interpretable axes amenable to diverse biological processes such as differentiation over time and cell cycle. We benchmark HSS-LDA against several popular dimensionality-reduction algorithms and illustrate its utility and versatility for the exploration of single-cell mass cytometry, transcriptomics, and chromatin accessibility data.