Cargando…
A robust nonlinear low-dimensional manifold for single cell RNA-seq data
BACKGROUND: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and org...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374962/ https://www.ncbi.nlm.nih.gov/pubmed/32693778 http://dx.doi.org/10.1186/s12859-020-03625-z |
Sumario: | BACKGROUND: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS: Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student’s t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model’s ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION: We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states. |
---|