Cargando…

Ultrafast clustering of single-cell flow cytometry data using FlowGrid

BACKGROUND: Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell po...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Xiaoxin, Ho, Joshua W. K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449887/
https://www.ncbi.nlm.nih.gov/pubmed/30953498
http://dx.doi.org/10.1186/s12918-019-0690-2
_version_ 1783408942442872832
author Ye, Xiaoxin
Ho, Joshua W. K.
author_facet Ye, Xiaoxin
Ho, Joshua W. K.
author_sort Ye, Xiaoxin
collection PubMed
description BACKGROUND: Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells. RESULTS: Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error. CONCLUSIONS: FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid.
format Online
Article
Text
id pubmed-6449887
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64498872019-04-15 Ultrafast clustering of single-cell flow cytometry data using FlowGrid Ye, Xiaoxin Ho, Joshua W. K. BMC Syst Biol Methods BACKGROUND: Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells. RESULTS: Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error. CONCLUSIONS: FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid. BioMed Central 2019-04-05 /pmc/articles/PMC6449887/ /pubmed/30953498 http://dx.doi.org/10.1186/s12918-019-0690-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methods
Ye, Xiaoxin
Ho, Joshua W. K.
Ultrafast clustering of single-cell flow cytometry data using FlowGrid
title Ultrafast clustering of single-cell flow cytometry data using FlowGrid
title_full Ultrafast clustering of single-cell flow cytometry data using FlowGrid
title_fullStr Ultrafast clustering of single-cell flow cytometry data using FlowGrid
title_full_unstemmed Ultrafast clustering of single-cell flow cytometry data using FlowGrid
title_short Ultrafast clustering of single-cell flow cytometry data using FlowGrid
title_sort ultrafast clustering of single-cell flow cytometry data using flowgrid
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449887/
https://www.ncbi.nlm.nih.gov/pubmed/30953498
http://dx.doi.org/10.1186/s12918-019-0690-2
work_keys_str_mv AT yexiaoxin ultrafastclusteringofsinglecellflowcytometrydatausingflowgrid
AT hojoshuawk ultrafastclusteringofsinglecellflowcytometrydatausingflowgrid