Cargando…
PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells
MOTIVATION: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS: We introduce a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203756/ https://www.ncbi.nlm.nih.gov/pubmed/31971583 http://dx.doi.org/10.1093/bioinformatics/btaa042 |
_version_ | 1783529928542650368 |
---|---|
author | Stassen, Shobana V Siu, Dickson M D Lee, Kelvin C M Ho, Joshua W K So, Hayden K H Tsia, Kevin K |
author_facet | Stassen, Shobana V Siu, Dickson M D Lee, Kelvin C M Ho, Joshua W K So, Hayden K H Tsia, Kevin K |
author_sort | Stassen, Shobana V |
collection | PubMed |
description | MOTIVATION: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS: We introduce a highly scalable graph-based clustering algorithm PARC—Phenotyping by Accelerated Refined Community-partitioning—for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7203756 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72037562020-05-11 PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells Stassen, Shobana V Siu, Dickson M D Lee, Kelvin C M Ho, Joshua W K So, Hayden K H Tsia, Kevin K Bioinformatics Original Papers MOTIVATION: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS: We introduce a highly scalable graph-based clustering algorithm PARC—Phenotyping by Accelerated Refined Community-partitioning—for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-01 2020-01-23 /pmc/articles/PMC7203756/ /pubmed/31971583 http://dx.doi.org/10.1093/bioinformatics/btaa042 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Stassen, Shobana V Siu, Dickson M D Lee, Kelvin C M Ho, Joshua W K So, Hayden K H Tsia, Kevin K PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells |
title | PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells |
title_full | PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells |
title_fullStr | PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells |
title_full_unstemmed | PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells |
title_short | PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells |
title_sort | parc: ultrafast and accurate clustering of phenotypic data of millions of single cells |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203756/ https://www.ncbi.nlm.nih.gov/pubmed/31971583 http://dx.doi.org/10.1093/bioinformatics/btaa042 |
work_keys_str_mv | AT stassenshobanav parcultrafastandaccurateclusteringofphenotypicdataofmillionsofsinglecells AT siudicksonmd parcultrafastandaccurateclusteringofphenotypicdataofmillionsofsinglecells AT leekelvincm parcultrafastandaccurateclusteringofphenotypicdataofmillionsofsinglecells AT hojoshuawk parcultrafastandaccurateclusteringofphenotypicdataofmillionsofsinglecells AT sohaydenkh parcultrafastandaccurateclusteringofphenotypicdataofmillionsofsinglecells AT tsiakevink parcultrafastandaccurateclusteringofphenotypicdataofmillionsofsinglecells |