Cargando…

CytoNorm: A Normalization Algorithm for Cytometry Data

High‐dimensional flow cytometry has matured to a level that enables deep phenotyping of cellular systems at a clinical scale. The resulting high‐content data sets allow characterizing the human immune system at unprecedented single cell resolution. However, the results are highly dependent on sample...

Descripción completa

Detalles Bibliográficos
Autores principales: Van Gassen, Sofie, Gaudilliere, Brice, Angst, Martin S., Saeys, Yvan, Aghaeepour, Nima
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7078957/
https://www.ncbi.nlm.nih.gov/pubmed/31633883
http://dx.doi.org/10.1002/cyto.a.23904
_version_ 1783507726958067712
author Van Gassen, Sofie
Gaudilliere, Brice
Angst, Martin S.
Saeys, Yvan
Aghaeepour, Nima
author_facet Van Gassen, Sofie
Gaudilliere, Brice
Angst, Martin S.
Saeys, Yvan
Aghaeepour, Nima
author_sort Van Gassen, Sofie
collection PubMed
description High‐dimensional flow cytometry has matured to a level that enables deep phenotyping of cellular systems at a clinical scale. The resulting high‐content data sets allow characterizing the human immune system at unprecedented single cell resolution. However, the results are highly dependent on sample preparation and measurements might drift over time. While various controls exist for assessment and improvement of data quality in a single sample, the challenges of cross‐sample normalization attempts have been limited to aligning marker distributions across subjects. These approaches, inspired by bulk genomics and proteomics assays, ignore the single‐cell nature of the data and risk the removal of biologically relevant signals. This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches. Data from the shared controls is used to learn the appropriate transformations for each batch (e.g., each analysis day). Importantly, some sources of technical variation are strongly influenced by the amount of protein expressed on specific cell types, requiring several population‐specific transformations to normalize cells from a heterogeneous sample. To address this, our approach first identifies the overall cellular distribution using a clustering step, and calculates subset‐specific transformations on the control samples by computing their quantile distributions and aligning them with splines. These transformations are then applied to all other clinical samples in the batch to remove the batch‐specific variations. We evaluated the algorithm on a customized data set with two shared controls across batches. One control sample was used for calculation of the normalization transformations and the second control was used as a blinded test set and evaluated with Earth Mover's distance. Additional results are provided using two real‐world clinical data sets. Overall, our method compared favorably to standard normalization procedures. The algorithm is implemented in the R package “CytoNorm” and available via the following link: http://www.github.com/saeyslab/CytoNorm © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
format Online
Article
Text
id pubmed-7078957
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-70789572020-03-19 CytoNorm: A Normalization Algorithm for Cytometry Data Van Gassen, Sofie Gaudilliere, Brice Angst, Martin S. Saeys, Yvan Aghaeepour, Nima Cytometry A Original Articles High‐dimensional flow cytometry has matured to a level that enables deep phenotyping of cellular systems at a clinical scale. The resulting high‐content data sets allow characterizing the human immune system at unprecedented single cell resolution. However, the results are highly dependent on sample preparation and measurements might drift over time. While various controls exist for assessment and improvement of data quality in a single sample, the challenges of cross‐sample normalization attempts have been limited to aligning marker distributions across subjects. These approaches, inspired by bulk genomics and proteomics assays, ignore the single‐cell nature of the data and risk the removal of biologically relevant signals. This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches. Data from the shared controls is used to learn the appropriate transformations for each batch (e.g., each analysis day). Importantly, some sources of technical variation are strongly influenced by the amount of protein expressed on specific cell types, requiring several population‐specific transformations to normalize cells from a heterogeneous sample. To address this, our approach first identifies the overall cellular distribution using a clustering step, and calculates subset‐specific transformations on the control samples by computing their quantile distributions and aligning them with splines. These transformations are then applied to all other clinical samples in the batch to remove the batch‐specific variations. We evaluated the algorithm on a customized data set with two shared controls across batches. One control sample was used for calculation of the normalization transformations and the second control was used as a blinded test set and evaluated with Earth Mover's distance. Additional results are provided using two real‐world clinical data sets. Overall, our method compared favorably to standard normalization procedures. The algorithm is implemented in the R package “CytoNorm” and available via the following link: http://www.github.com/saeyslab/CytoNorm © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry. John Wiley & Sons, Inc. 2019-10-21 2020-03 /pmc/articles/PMC7078957/ /pubmed/31633883 http://dx.doi.org/10.1002/cyto.a.23904 Text en © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Van Gassen, Sofie
Gaudilliere, Brice
Angst, Martin S.
Saeys, Yvan
Aghaeepour, Nima
CytoNorm: A Normalization Algorithm for Cytometry Data
title CytoNorm: A Normalization Algorithm for Cytometry Data
title_full CytoNorm: A Normalization Algorithm for Cytometry Data
title_fullStr CytoNorm: A Normalization Algorithm for Cytometry Data
title_full_unstemmed CytoNorm: A Normalization Algorithm for Cytometry Data
title_short CytoNorm: A Normalization Algorithm for Cytometry Data
title_sort cytonorm: a normalization algorithm for cytometry data
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7078957/
https://www.ncbi.nlm.nih.gov/pubmed/31633883
http://dx.doi.org/10.1002/cyto.a.23904
work_keys_str_mv AT vangassensofie cytonormanormalizationalgorithmforcytometrydata
AT gaudillierebrice cytonormanormalizationalgorithmforcytometrydata
AT angstmartins cytonormanormalizationalgorithmforcytometrydata
AT saeysyvan cytonormanormalizationalgorithmforcytometrydata
AT aghaeepournima cytonormanormalizationalgorithmforcytometrydata