Cargando…

A comparison framework and guideline of clustering methods for mass cytometry data

BACKGROUND: With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populati...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Xiao, Song, Weichen, Wong, Brandon Y., Zhang, Ting, Yu, Shunying, Lin, Guan Ning, Ding, Xianting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929440/
https://www.ncbi.nlm.nih.gov/pubmed/31870419
http://dx.doi.org/10.1186/s13059-019-1917-7
_version_ 1783482700823265280
author Liu, Xiao
Song, Weichen
Wong, Brandon Y.
Zhang, Ting
Yu, Shunying
Lin, Guan Ning
Ding, Xianting
author_facet Liu, Xiao
Song, Weichen
Wong, Brandon Y.
Zhang, Ting
Yu, Shunying
Lin, Guan Ning
Ding, Xianting
author_sort Liu, Xiao
collection PubMed
description BACKGROUND: With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. RESULT: To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. CONCLUSION: All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.
format Online
Article
Text
id pubmed-6929440
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69294402019-12-30 A comparison framework and guideline of clustering methods for mass cytometry data Liu, Xiao Song, Weichen Wong, Brandon Y. Zhang, Ting Yu, Shunying Lin, Guan Ning Ding, Xianting Genome Biol Research BACKGROUND: With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. RESULT: To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. CONCLUSION: All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools. BioMed Central 2019-12-23 /pmc/articles/PMC6929440/ /pubmed/31870419 http://dx.doi.org/10.1186/s13059-019-1917-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Xiao
Song, Weichen
Wong, Brandon Y.
Zhang, Ting
Yu, Shunying
Lin, Guan Ning
Ding, Xianting
A comparison framework and guideline of clustering methods for mass cytometry data
title A comparison framework and guideline of clustering methods for mass cytometry data
title_full A comparison framework and guideline of clustering methods for mass cytometry data
title_fullStr A comparison framework and guideline of clustering methods for mass cytometry data
title_full_unstemmed A comparison framework and guideline of clustering methods for mass cytometry data
title_short A comparison framework and guideline of clustering methods for mass cytometry data
title_sort comparison framework and guideline of clustering methods for mass cytometry data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929440/
https://www.ncbi.nlm.nih.gov/pubmed/31870419
http://dx.doi.org/10.1186/s13059-019-1917-7
work_keys_str_mv AT liuxiao acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT songweichen acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT wongbrandony acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT zhangting acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT yushunying acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT linguanning acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT dingxianting acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT liuxiao comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT songweichen comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT wongbrandony comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT zhangting comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT yushunying comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT linguanning comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT dingxianting comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata