Cargando…
Semisoft clustering of single-cell data
Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called sem...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329952/ https://www.ncbi.nlm.nih.gov/pubmed/30587579 http://dx.doi.org/10.1073/pnas.1817715116 |
_version_ | 1783386903426367488 |
---|---|
author | Zhu, Lingxue Lei, Jing Klei, Lambertus Devlin, Bernie Roeder, Kathryn |
author_facet | Zhu, Lingxue Lei, Jing Klei, Lambertus Devlin, Bernie Roeder, Kathryn |
author_sort | Zhu, Lingxue |
collection | PubMed |
description | Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of [Formula: see text] discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain. |
format | Online Article Text |
id | pubmed-6329952 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-63299522019-01-14 Semisoft clustering of single-cell data Zhu, Lingxue Lei, Jing Klei, Lambertus Devlin, Bernie Roeder, Kathryn Proc Natl Acad Sci U S A Physical Sciences Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of [Formula: see text] discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain. National Academy of Sciences 2019-01-08 2018-12-26 /pmc/articles/PMC6329952/ /pubmed/30587579 http://dx.doi.org/10.1073/pnas.1817715116 Text en Copyright © 2019 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Physical Sciences Zhu, Lingxue Lei, Jing Klei, Lambertus Devlin, Bernie Roeder, Kathryn Semisoft clustering of single-cell data |
title | Semisoft clustering of single-cell data |
title_full | Semisoft clustering of single-cell data |
title_fullStr | Semisoft clustering of single-cell data |
title_full_unstemmed | Semisoft clustering of single-cell data |
title_short | Semisoft clustering of single-cell data |
title_sort | semisoft clustering of single-cell data |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329952/ https://www.ncbi.nlm.nih.gov/pubmed/30587579 http://dx.doi.org/10.1073/pnas.1817715116 |
work_keys_str_mv | AT zhulingxue semisoftclusteringofsinglecelldata AT leijing semisoftclusteringofsinglecelldata AT kleilambertus semisoftclusteringofsinglecelldata AT devlinbernie semisoftclusteringofsinglecelldata AT roederkathryn semisoftclusteringofsinglecelldata |