Cargando…
Statistical significance of cluster membership for unsupervised evaluation of cell identities
MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214036/ https://www.ncbi.nlm.nih.gov/pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087 |
_version_ | 1783531900021768192 |
---|---|
author | Chung, Neo Christopher |
author_facet | Chung, Neo Christopher |
author_sort | Chung, Neo Christopher |
collection | PubMed |
description | MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. RESULTS: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/package=jackstraw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7214036 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72140362020-05-15 Statistical significance of cluster membership for unsupervised evaluation of cell identities Chung, Neo Christopher Bioinformatics Original Papers MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. RESULTS: We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lines provides two distinct cellular populations. Second, Cell Hashing yields cell identities corresponding to eight donors which are independently analyzed by the jackstraw. Third, peripheral blood mononuclear cells are used to explore heterogeneous immune populations. The proposed P-values and PIPs lead to probabilistic feature selection of single cells that can be visualized using principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and others. By learning uncertainty in clustering high-dimensional data, the proposed methods enable unsupervised evaluation of cluster membership. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/package=jackstraw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-15 2020-03-06 /pmc/articles/PMC7214036/ /pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Chung, Neo Christopher Statistical significance of cluster membership for unsupervised evaluation of cell identities |
title | Statistical significance of cluster membership for unsupervised evaluation of cell identities |
title_full | Statistical significance of cluster membership for unsupervised evaluation of cell identities |
title_fullStr | Statistical significance of cluster membership for unsupervised evaluation of cell identities |
title_full_unstemmed | Statistical significance of cluster membership for unsupervised evaluation of cell identities |
title_short | Statistical significance of cluster membership for unsupervised evaluation of cell identities |
title_sort | statistical significance of cluster membership for unsupervised evaluation of cell identities |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214036/ https://www.ncbi.nlm.nih.gov/pubmed/32142108 http://dx.doi.org/10.1093/bioinformatics/btaa087 |
work_keys_str_mv | AT chungneochristopher statisticalsignificanceofclustermembershipforunsupervisedevaluationofcellidentities |