Cargando…
A comparison framework and guideline of clustering methods for mass cytometry data
BACKGROUND: With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populati...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929440/ https://www.ncbi.nlm.nih.gov/pubmed/31870419 http://dx.doi.org/10.1186/s13059-019-1917-7 |
_version_ | 1783482700823265280 |
---|---|
author | Liu, Xiao Song, Weichen Wong, Brandon Y. Zhang, Ting Yu, Shunying Lin, Guan Ning Ding, Xianting |
author_facet | Liu, Xiao Song, Weichen Wong, Brandon Y. Zhang, Ting Yu, Shunying Lin, Guan Ning Ding, Xianting |
author_sort | Liu, Xiao |
collection | PubMed |
description | BACKGROUND: With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. RESULT: To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. CONCLUSION: All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools. |
format | Online Article Text |
id | pubmed-6929440 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69294402019-12-30 A comparison framework and guideline of clustering methods for mass cytometry data Liu, Xiao Song, Weichen Wong, Brandon Y. Zhang, Ting Yu, Shunying Lin, Guan Ning Ding, Xianting Genome Biol Research BACKGROUND: With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. RESULT: To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. CONCLUSION: All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools. BioMed Central 2019-12-23 /pmc/articles/PMC6929440/ /pubmed/31870419 http://dx.doi.org/10.1186/s13059-019-1917-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Liu, Xiao Song, Weichen Wong, Brandon Y. Zhang, Ting Yu, Shunying Lin, Guan Ning Ding, Xianting A comparison framework and guideline of clustering methods for mass cytometry data |
title | A comparison framework and guideline of clustering methods for mass cytometry data |
title_full | A comparison framework and guideline of clustering methods for mass cytometry data |
title_fullStr | A comparison framework and guideline of clustering methods for mass cytometry data |
title_full_unstemmed | A comparison framework and guideline of clustering methods for mass cytometry data |
title_short | A comparison framework and guideline of clustering methods for mass cytometry data |
title_sort | comparison framework and guideline of clustering methods for mass cytometry data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929440/ https://www.ncbi.nlm.nih.gov/pubmed/31870419 http://dx.doi.org/10.1186/s13059-019-1917-7 |
work_keys_str_mv | AT liuxiao acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT songweichen acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT wongbrandony acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT zhangting acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT yushunying acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT linguanning acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT dingxianting acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT liuxiao comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT songweichen comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT wongbrandony comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT zhangting comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT yushunying comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT linguanning comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT dingxianting comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata |