Cargando…

Semisoft clustering of single-cell data

Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called sem...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Lingxue, Lei, Jing, Klei, Lambertus, Devlin, Bernie, Roeder, Kathryn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329952/
https://www.ncbi.nlm.nih.gov/pubmed/30587579
http://dx.doi.org/10.1073/pnas.1817715116
_version_ 1783386903426367488
author Zhu, Lingxue
Lei, Jing
Klei, Lambertus
Devlin, Bernie
Roeder, Kathryn
author_facet Zhu, Lingxue
Lei, Jing
Klei, Lambertus
Devlin, Bernie
Roeder, Kathryn
author_sort Zhu, Lingxue
collection PubMed
description Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of [Formula: see text] discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain.
format Online
Article
Text
id pubmed-6329952
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-63299522019-01-14 Semisoft clustering of single-cell data Zhu, Lingxue Lei, Jing Klei, Lambertus Devlin, Bernie Roeder, Kathryn Proc Natl Acad Sci U S A Physical Sciences Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of [Formula: see text] discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain. National Academy of Sciences 2019-01-08 2018-12-26 /pmc/articles/PMC6329952/ /pubmed/30587579 http://dx.doi.org/10.1073/pnas.1817715116 Text en Copyright © 2019 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Zhu, Lingxue
Lei, Jing
Klei, Lambertus
Devlin, Bernie
Roeder, Kathryn
Semisoft clustering of single-cell data
title Semisoft clustering of single-cell data
title_full Semisoft clustering of single-cell data
title_fullStr Semisoft clustering of single-cell data
title_full_unstemmed Semisoft clustering of single-cell data
title_short Semisoft clustering of single-cell data
title_sort semisoft clustering of single-cell data
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329952/
https://www.ncbi.nlm.nih.gov/pubmed/30587579
http://dx.doi.org/10.1073/pnas.1817715116
work_keys_str_mv AT zhulingxue semisoftclusteringofsinglecelldata
AT leijing semisoftclusteringofsinglecelldata
AT kleilambertus semisoftclusteringofsinglecelldata
AT devlinbernie semisoftclusteringofsinglecelldata
AT roederkathryn semisoftclusteringofsinglecelldata