Cargando…

Data reduction for spectral clustering to analyze high throughput flow cytometry data

BACKGROUND: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zare, Habil, Shooshtari, Parisa, Gupta, Arvind, Brinkman, Ryan R
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923634/ https://www.ncbi.nlm.nih.gov/pubmed/20667133 http://dx.doi.org/10.1186/1471-2105-11-403

_version_	1782185529250414592
author	Zare, Habil Shooshtari, Parisa Gupta, Arvind Brinkman, Ryan R
author_facet	Zare, Habil Shooshtari, Parisa Gupta, Arvind Brinkman, Ryan R
author_sort	Zare, Habil
collection	PubMed
description	BACKGROUND: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL. RESULTS: We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations. CONCLUSIONS: This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor.
format	Text
id	pubmed-2923634
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29236342010-08-21 Data reduction for spectral clustering to analyze high throughput flow cytometry data Zare, Habil Shooshtari, Parisa Gupta, Arvind Brinkman, Ryan R BMC Bioinformatics Methodology Article BACKGROUND: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL. RESULTS: We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations. CONCLUSIONS: This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor. BioMed Central 2010-07-28 /pmc/articles/PMC2923634/ /pubmed/20667133 http://dx.doi.org/10.1186/1471-2105-11-403 Text en Copyright ©2010 Zare et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Zare, Habil Shooshtari, Parisa Gupta, Arvind Brinkman, Ryan R Data reduction for spectral clustering to analyze high throughput flow cytometry data
title	Data reduction for spectral clustering to analyze high throughput flow cytometry data
title_full	Data reduction for spectral clustering to analyze high throughput flow cytometry data
title_fullStr	Data reduction for spectral clustering to analyze high throughput flow cytometry data
title_full_unstemmed	Data reduction for spectral clustering to analyze high throughput flow cytometry data
title_short	Data reduction for spectral clustering to analyze high throughput flow cytometry data
title_sort	data reduction for spectral clustering to analyze high throughput flow cytometry data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923634/ https://www.ncbi.nlm.nih.gov/pubmed/20667133 http://dx.doi.org/10.1186/1471-2105-11-403
work_keys_str_mv	AT zarehabil datareductionforspectralclusteringtoanalyzehighthroughputflowcytometrydata AT shooshtariparisa datareductionforspectralclusteringtoanalyzehighthroughputflowcytometrydata AT guptaarvind datareductionforspectralclusteringtoanalyzehighthroughputflowcytometrydata AT brinkmanryanr datareductionforspectralclusteringtoanalyzehighthroughputflowcytometrydata

Data reduction for spectral clustering to analyze high throughput flow cytometry data

Ejemplares similares