Cargando…

Optimizing transformations for automated, high throughput analysis of flow cytometry data

BACKGROUND: In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of da...

Descripción completa

Detalles Bibliográficos
Autores principales: Finak, Greg, Perez, Juan-Manuel, Weng, Andrew, Gottardo, Raphael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243046/
https://www.ncbi.nlm.nih.gov/pubmed/21050468
http://dx.doi.org/10.1186/1471-2105-11-546
_version_ 1782219666942328832
author Finak, Greg
Perez, Juan-Manuel
Weng, Andrew
Gottardo, Raphael
author_facet Finak, Greg
Perez, Juan-Manuel
Weng, Andrew
Gottardo, Raphael
author_sort Finak, Greg
collection PubMed
description BACKGROUND: In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations. RESULTS: We compare the performance of parameter-optimized and default-parameter (in flowCore) data transformations on real and simulated data by measuring the variation in the locations of cell populations across samples, discovered via automated gating in both the scatter and fluorescence channels. We find that parameter-optimized transformations improve visualization, reduce variability in the location of discovered cell populations across samples, and decrease the misclassification (mis-gating) of individual events when compared to default-parameter counterparts. CONCLUSIONS: Our results indicate that the preferred transformation for fluorescence channels is a parameter- optimized biexponential or generalized Box-Cox, in accordance with current best practices. Interestingly, for populations in the scatter channels, we find that the optimized hyperbolic arcsine may be a better choice in a high-throughput setting than current standard practice of no transformation. However, generally speaking, the choice of transformation remains data-dependent. We have implemented our algorithm in the BioConductor package, flowTrans, which is publicly available.
format Online
Article
Text
id pubmed-3243046
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32430462011-12-20 Optimizing transformations for automated, high throughput analysis of flow cytometry data Finak, Greg Perez, Juan-Manuel Weng, Andrew Gottardo, Raphael BMC Bioinformatics Methodology Article BACKGROUND: In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations. RESULTS: We compare the performance of parameter-optimized and default-parameter (in flowCore) data transformations on real and simulated data by measuring the variation in the locations of cell populations across samples, discovered via automated gating in both the scatter and fluorescence channels. We find that parameter-optimized transformations improve visualization, reduce variability in the location of discovered cell populations across samples, and decrease the misclassification (mis-gating) of individual events when compared to default-parameter counterparts. CONCLUSIONS: Our results indicate that the preferred transformation for fluorescence channels is a parameter- optimized biexponential or generalized Box-Cox, in accordance with current best practices. Interestingly, for populations in the scatter channels, we find that the optimized hyperbolic arcsine may be a better choice in a high-throughput setting than current standard practice of no transformation. However, generally speaking, the choice of transformation remains data-dependent. We have implemented our algorithm in the BioConductor package, flowTrans, which is publicly available. BioMed Central 2010-11-04 /pmc/articles/PMC3243046/ /pubmed/21050468 http://dx.doi.org/10.1186/1471-2105-11-546 Text en Copyright ©2010 Finak et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Finak, Greg
Perez, Juan-Manuel
Weng, Andrew
Gottardo, Raphael
Optimizing transformations for automated, high throughput analysis of flow cytometry data
title Optimizing transformations for automated, high throughput analysis of flow cytometry data
title_full Optimizing transformations for automated, high throughput analysis of flow cytometry data
title_fullStr Optimizing transformations for automated, high throughput analysis of flow cytometry data
title_full_unstemmed Optimizing transformations for automated, high throughput analysis of flow cytometry data
title_short Optimizing transformations for automated, high throughput analysis of flow cytometry data
title_sort optimizing transformations for automated, high throughput analysis of flow cytometry data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243046/
https://www.ncbi.nlm.nih.gov/pubmed/21050468
http://dx.doi.org/10.1186/1471-2105-11-546
work_keys_str_mv AT finakgreg optimizingtransformationsforautomatedhighthroughputanalysisofflowcytometrydata
AT perezjuanmanuel optimizingtransformationsforautomatedhighthroughputanalysisofflowcytometrydata
AT wengandrew optimizingtransformationsforautomatedhighthroughputanalysisofflowcytometrydata
AT gottardoraphael optimizingtransformationsforautomatedhighthroughputanalysisofflowcytometrydata