Cargando…

Predicting Cell Populations in Single Cell Mass Cytometry Data

Mass cytometry by time‐of‐flight (CyTOF) is a valuable technology for high‐dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, which is essential to identify “new” cell...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdelaal, Tamim, van Unen, Vincent, Höllt, Thomas, Koning, Frits, Reinders, Marcel J.T., Mahfouz, Ahmed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6767556/
https://www.ncbi.nlm.nih.gov/pubmed/30861637
http://dx.doi.org/10.1002/cyto.a.23738
_version_ 1783454945228357632
author Abdelaal, Tamim
van Unen, Vincent
Höllt, Thomas
Koning, Frits
Reinders, Marcel J.T.
Mahfouz, Ahmed
author_facet Abdelaal, Tamim
van Unen, Vincent
Höllt, Thomas
Koning, Frits
Reinders, Marcel J.T.
Mahfouz, Ahmed
author_sort Abdelaal, Tamim
collection PubMed
description Mass cytometry by time‐of‐flight (CyTOF) is a valuable technology for high‐dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, which is essential to identify “new” cell populations in explorative experiments. However, relying on clustering is laborious since it often involves manual annotation, which significantly limits the reproducibility of identifying cell‐populations across different samples. The latter is particularly important in studies comparing different conditions, for example in cohort studies. Learning cell populations from an annotated set of cells solves these problems. However, currently available methods for automatic cell population identification are either complex, dependent on prior biological knowledge about the populations during the learning process, or can only identify canonical cell populations. We propose to use a linear discriminant analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA outperforms two state‐of‐the‐art algorithms on four benchmark datasets. Compared to more complex classifiers, LDA has substantial advantages with respect to the interpretable performance, reproducibility, and scalability to larger datasets with deeper annotations. We apply LDA to a dataset of ~3.5 million cells representing 57 cell populations in the Human Mucosal Immune System. LDA has high performance on abundant cell populations as well as the majority of rare cell populations, and provides accurate estimates of cell population frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify previously unknown (new) cell populations that were not encountered during training. Altogether, reproducible prediction of cell population compositions using LDA opens up possibilities to analyze large cohort studies based on CyTOF data. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
format Online
Article
Text
id pubmed-6767556
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-67675562019-10-03 Predicting Cell Populations in Single Cell Mass Cytometry Data Abdelaal, Tamim van Unen, Vincent Höllt, Thomas Koning, Frits Reinders, Marcel J.T. Mahfouz, Ahmed Cytometry A Original Articles Mass cytometry by time‐of‐flight (CyTOF) is a valuable technology for high‐dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, which is essential to identify “new” cell populations in explorative experiments. However, relying on clustering is laborious since it often involves manual annotation, which significantly limits the reproducibility of identifying cell‐populations across different samples. The latter is particularly important in studies comparing different conditions, for example in cohort studies. Learning cell populations from an annotated set of cells solves these problems. However, currently available methods for automatic cell population identification are either complex, dependent on prior biological knowledge about the populations during the learning process, or can only identify canonical cell populations. We propose to use a linear discriminant analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA outperforms two state‐of‐the‐art algorithms on four benchmark datasets. Compared to more complex classifiers, LDA has substantial advantages with respect to the interpretable performance, reproducibility, and scalability to larger datasets with deeper annotations. We apply LDA to a dataset of ~3.5 million cells representing 57 cell populations in the Human Mucosal Immune System. LDA has high performance on abundant cell populations as well as the majority of rare cell populations, and provides accurate estimates of cell population frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify previously unknown (new) cell populations that were not encountered during training. Altogether, reproducible prediction of cell population compositions using LDA opens up possibilities to analyze large cohort studies based on CyTOF data. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry. John Wiley & Sons, Inc. 2019-03-12 2019-07 /pmc/articles/PMC6767556/ /pubmed/30861637 http://dx.doi.org/10.1002/cyto.a.23738 Text en © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Original Articles
Abdelaal, Tamim
van Unen, Vincent
Höllt, Thomas
Koning, Frits
Reinders, Marcel J.T.
Mahfouz, Ahmed
Predicting Cell Populations in Single Cell Mass Cytometry Data
title Predicting Cell Populations in Single Cell Mass Cytometry Data
title_full Predicting Cell Populations in Single Cell Mass Cytometry Data
title_fullStr Predicting Cell Populations in Single Cell Mass Cytometry Data
title_full_unstemmed Predicting Cell Populations in Single Cell Mass Cytometry Data
title_short Predicting Cell Populations in Single Cell Mass Cytometry Data
title_sort predicting cell populations in single cell mass cytometry data
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6767556/
https://www.ncbi.nlm.nih.gov/pubmed/30861637
http://dx.doi.org/10.1002/cyto.a.23738
work_keys_str_mv AT abdelaaltamim predictingcellpopulationsinsinglecellmasscytometrydata
AT vanunenvincent predictingcellpopulationsinsinglecellmasscytometrydata
AT holltthomas predictingcellpopulationsinsinglecellmasscytometrydata
AT koningfrits predictingcellpopulationsinsinglecellmasscytometrydata
AT reindersmarceljt predictingcellpopulationsinsinglecellmasscytometrydata
AT mahfouzahmed predictingcellpopulationsinsinglecellmasscytometrydata