Cargando…

Compositional Data Analysis using Kernels in mass cytometry data

MOTIVATION: Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Rudra, Pratyaydipta, Baxter, Ryan, Hsieh, Elena W Y, Ghosh, Debashis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867823/
https://www.ncbi.nlm.nih.gov/pubmed/35224501
http://dx.doi.org/10.1093/bioadv/vbac003
_version_ 1784656133770706944
author Rudra, Pratyaydipta
Baxter, Ryan
Hsieh, Elena W Y
Ghosh, Debashis
author_facet Rudra, Pratyaydipta
Baxter, Ryan
Hsieh, Elena W Y
Ghosh, Debashis
author_sort Rudra, Pratyaydipta
collection PubMed
description MOTIVATION: Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small. RESULTS: We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes (n < 25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects. AVAILABILITY AND IMPLEMENTATION: CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/. CONTACT: prudra@okstate.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-8867823
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88678232022-02-25 Compositional Data Analysis using Kernels in mass cytometry data Rudra, Pratyaydipta Baxter, Ryan Hsieh, Elena W Y Ghosh, Debashis Bioinform Adv Original Article MOTIVATION: Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small. RESULTS: We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes (n < 25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects. AVAILABILITY AND IMPLEMENTATION: CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/. CONTACT: prudra@okstate.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-02-11 /pmc/articles/PMC8867823/ /pubmed/35224501 http://dx.doi.org/10.1093/bioadv/vbac003 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Rudra, Pratyaydipta
Baxter, Ryan
Hsieh, Elena W Y
Ghosh, Debashis
Compositional Data Analysis using Kernels in mass cytometry data
title Compositional Data Analysis using Kernels in mass cytometry data
title_full Compositional Data Analysis using Kernels in mass cytometry data
title_fullStr Compositional Data Analysis using Kernels in mass cytometry data
title_full_unstemmed Compositional Data Analysis using Kernels in mass cytometry data
title_short Compositional Data Analysis using Kernels in mass cytometry data
title_sort compositional data analysis using kernels in mass cytometry data
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867823/
https://www.ncbi.nlm.nih.gov/pubmed/35224501
http://dx.doi.org/10.1093/bioadv/vbac003
work_keys_str_mv AT rudrapratyaydipta compositionaldataanalysisusingkernelsinmasscytometrydata
AT baxterryan compositionaldataanalysisusingkernelsinmasscytometrydata
AT hsiehelenawy compositionaldataanalysisusingkernelsinmasscytometrydata
AT ghoshdebashis compositionaldataanalysisusingkernelsinmasscytometrydata