Cargando…

Misty Mountain clustering: application to fast unsupervised flow cytometry gating

BACKGROUND: There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local mini...

Descripción completa

Detalles Bibliográficos
Autores principales: Sugár, István P, Sealfon, Stuart C
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2967560/
https://www.ncbi.nlm.nih.gov/pubmed/20932336
http://dx.doi.org/10.1186/1471-2105-11-502
_version_ 1782189687726669824
author Sugár, István P
Sealfon, Stuart C
author_facet Sugár, István P
Sealfon, Stuart C
author_sort Sugár, István P
collection PubMed
description BACKGROUND: There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 10(6 )points that are often generated by high throughput experiments. RESULTS: To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 10(6 )data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. CONCLUSIONS: Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data.
format Text
id pubmed-2967560
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29675602010-11-03 Misty Mountain clustering: application to fast unsupervised flow cytometry gating Sugár, István P Sealfon, Stuart C BMC Bioinformatics Methodology Article BACKGROUND: There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 10(6 )points that are often generated by high throughput experiments. RESULTS: To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 10(6 )data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. CONCLUSIONS: Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data. BioMed Central 2010-10-09 /pmc/articles/PMC2967560/ /pubmed/20932336 http://dx.doi.org/10.1186/1471-2105-11-502 Text en Copyright ©2010 Sugár and Sealfon; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Sugár, István P
Sealfon, Stuart C
Misty Mountain clustering: application to fast unsupervised flow cytometry gating
title Misty Mountain clustering: application to fast unsupervised flow cytometry gating
title_full Misty Mountain clustering: application to fast unsupervised flow cytometry gating
title_fullStr Misty Mountain clustering: application to fast unsupervised flow cytometry gating
title_full_unstemmed Misty Mountain clustering: application to fast unsupervised flow cytometry gating
title_short Misty Mountain clustering: application to fast unsupervised flow cytometry gating
title_sort misty mountain clustering: application to fast unsupervised flow cytometry gating
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2967560/
https://www.ncbi.nlm.nih.gov/pubmed/20932336
http://dx.doi.org/10.1186/1471-2105-11-502
work_keys_str_mv AT sugaristvanp mistymountainclusteringapplicationtofastunsupervisedflowcytometrygating
AT sealfonstuartc mistymountainclusteringapplicationtofastunsupervisedflowcytometrygating