Cargando…

Statistical significance of cluster membership for unsupervised evaluation of cell identities

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a...

Descripción completa

Detalles Bibliográficos
Autor principal:	Chung, Neo Christopher
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214036/ https://www.ncbi.nlm.nih.gov/pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087

_version_	1783531900021768192
author	Chung, Neo Christopher
author_facet	Chung, Neo Christopher
author_sort	Chung, Neo Christopher
collection	PubMed
description	MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. RESULTS: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/package=jackstraw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-7214036
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-72140362020-05-15 Statistical significance of cluster membership for unsupervised evaluation of cell identities Chung, Neo Christopher Bioinformatics Original Papers MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. RESULTS: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/package=jackstraw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-15 2020-03-06 /pmc/articles/PMC7214036/ /pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Chung, Neo Christopher Statistical significance of cluster membership for unsupervised evaluation of cell identities
title	Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_full	Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_fullStr	Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_full_unstemmed	Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_short	Statistical significance of cluster membership for unsupervised evaluation of cell identities
title_sort	statistical significance of cluster membership for unsupervised evaluation of cell identities
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214036/ https://www.ncbi.nlm.nih.gov/pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087
work_keys_str_mv	AT chungneochristopher statisticalsignificanceofclustermembershipforunsupervisedevaluationofcellidentities

Statistical significance of cluster membership for unsupervised evaluation of cell identities

Ejemplares similares