Cargando…

Statistical significance of cluster membership for unsupervised evaluation of cell identities

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a...

Descripción completa

Detalles Bibliográficos
Autor principal: Chung, Neo Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214036/
https://www.ncbi.nlm.nih.gov/pubmed/32142108
http://dx.doi.org/10.1093/bioinformatics/btaa087
_version_ 1783531900021768192
author Chung, Neo Christopher
author_facet Chung, Neo Christopher
author_sort Chung, Neo Christopher
collection PubMed
description MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. RESULTS: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/package=jackstraw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7214036
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72140362020-05-15 Statistical significance of cluster membership for unsupervised evaluation of cell identities Chung, Neo Christopher Bioinformatics Original Papers MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. RESULTS: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/package=jackstraw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-15 2020-03-06 /pmc/articles/PMC7214036/ /pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Chung, Neo Christopher
Statistical significance of cluster membership for unsupervised evaluation of cell identities
title Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_full Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_fullStr Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_full_unstemmed Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_short Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_sort statistical significance of cluster membership for unsupervised evaluation of cell identities
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214036/
https://www.ncbi.nlm.nih.gov/pubmed/32142108
http://dx.doi.org/10.1093/bioinformatics/btaa087
work_keys_str_mv AT chungneochristopher statisticalsignificanceofclustermembershipforunsupervisedevaluationofcellidentities