Cargando…

SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design

We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attri...

Descripción completa

Detalles Bibliográficos
Autores principales:	Naim, Iftekhar, Datta, Suprakash, Rebhahn, Jonathan, Cavenaugh, James S, Mosmann, Tim R, Sharma, Gaurav
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BlackWell Publishing Ltd 2014
Materias:	Original Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4238829/ https://www.ncbi.nlm.nih.gov/pubmed/24677621 http://dx.doi.org/10.1002/cyto.a.22446

_version_	1782345522949914624
author	Naim, Iftekhar Datta, Suprakash Rebhahn, Jonathan Cavenaugh, James S Mosmann, Tim R Sharma, Gaurav
author_facet	Naim, Iftekhar Datta, Suprakash Rebhahn, Jonathan Cavenaugh, James S Mosmann, Tim R Sharma, Gaurav
author_sort	Naim, Iftekhar
collection	PubMed
description	We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. © 2014 The Authors. Published by Wiley Periodicals Inc.
format	Online Article Text
id	pubmed-4238829
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BlackWell Publishing Ltd
record_format	MEDLINE/PubMed
spelling	pubmed-42388292014-11-28 SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design Naim, Iftekhar Datta, Suprakash Rebhahn, Jonathan Cavenaugh, James S Mosmann, Tim R Sharma, Gaurav Cytometry A Original Articles We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. © 2014 The Authors. Published by Wiley Periodicals Inc. BlackWell Publishing Ltd 2014-05 2014-02-14 /pmc/articles/PMC4238829/ /pubmed/24677621 http://dx.doi.org/10.1002/cyto.a.22446 Text en © 2014 The Authors. Published by Wiley Periodicals Inc. on behalf of the International Society for Advancement of Cytometry. http://creativecommons.org/licenses/by-nc/3.0/ This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Original Articles Naim, Iftekhar Datta, Suprakash Rebhahn, Jonathan Cavenaugh, James S Mosmann, Tim R Sharma, Gaurav SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
title	SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
title_full	SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
title_fullStr	SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
title_full_unstemmed	SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
title_short	SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
title_sort	swift—scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 1: algorithm design
topic	Original Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4238829/ https://www.ncbi.nlm.nih.gov/pubmed/24677621 http://dx.doi.org/10.1002/cyto.a.22446
work_keys_str_mv	AT naimiftekhar swiftscalableclusteringforautomatedidentificationofrarecellpopulationsinlargehighdimensionalflowcytometrydatasetspart1algorithmdesign AT dattasuprakash swiftscalableclusteringforautomatedidentificationofrarecellpopulationsinlargehighdimensionalflowcytometrydatasetspart1algorithmdesign AT rebhahnjonathan swiftscalableclusteringforautomatedidentificationofrarecellpopulationsinlargehighdimensionalflowcytometrydatasetspart1algorithmdesign AT cavenaughjamess swiftscalableclusteringforautomatedidentificationofrarecellpopulationsinlargehighdimensionalflowcytometrydatasetspart1algorithmdesign AT mosmanntimr swiftscalableclusteringforautomatedidentificationofrarecellpopulationsinlargehighdimensionalflowcytometrydatasetspart1algorithmdesign AT sharmagaurav swiftscalableclusteringforautomatedidentificationofrarecellpopulationsinlargehighdimensionalflowcytometrydatasetspart1algorithmdesign

SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design

Ejemplares similares