Cargando…

Determining clinically relevant features in cytometry data using persistent homology

Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more s...

Descripción completa

Detalles Bibliográficos
Autores principales: Mukherjee, Soham, Wethington, Darren, Dey, Tamal K., Das, Jayajit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009779/
https://www.ncbi.nlm.nih.gov/pubmed/35312683
http://dx.doi.org/10.1371/journal.pcbi.1009931
_version_ 1784687337647636480
author Mukherjee, Soham
Wethington, Darren
Dey, Tamal K.
Das, Jayajit
author_facet Mukherjee, Soham
Wethington, Darren
Dey, Tamal K.
Das, Jayajit
author_sort Mukherjee, Soham
collection PubMed
description Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.
format Online
Article
Text
id pubmed-9009779
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90097792022-04-15 Determining clinically relevant features in cytometry data using persistent homology Mukherjee, Soham Wethington, Darren Dey, Tamal K. Das, Jayajit PLoS Comput Biol Research Article Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools. Public Library of Science 2022-03-21 /pmc/articles/PMC9009779/ /pubmed/35312683 http://dx.doi.org/10.1371/journal.pcbi.1009931 Text en © 2022 Mukherjee et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mukherjee, Soham
Wethington, Darren
Dey, Tamal K.
Das, Jayajit
Determining clinically relevant features in cytometry data using persistent homology
title Determining clinically relevant features in cytometry data using persistent homology
title_full Determining clinically relevant features in cytometry data using persistent homology
title_fullStr Determining clinically relevant features in cytometry data using persistent homology
title_full_unstemmed Determining clinically relevant features in cytometry data using persistent homology
title_short Determining clinically relevant features in cytometry data using persistent homology
title_sort determining clinically relevant features in cytometry data using persistent homology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009779/
https://www.ncbi.nlm.nih.gov/pubmed/35312683
http://dx.doi.org/10.1371/journal.pcbi.1009931
work_keys_str_mv AT mukherjeesoham determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology
AT wethingtondarren determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology
AT deytamalk determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology
AT dasjayajit determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology