Cargando…

A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects

BACKGROUND: Flow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems. The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dundar, Murat, Akova, Ferit, Yerebakan, Halid Z, Rajwa, Bartek
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262223/ https://www.ncbi.nlm.nih.gov/pubmed/25248977 http://dx.doi.org/10.1186/1471-2105-15-314

_version_	1782348400208904192
author	Dundar, Murat Akova, Ferit Yerebakan, Halid Z Rajwa, Bartek
author_facet	Dundar, Murat Akova, Ferit Yerebakan, Halid Z Rajwa, Bartek
author_sort	Dundar, Murat
collection	PubMed
description	BACKGROUND: Flow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems. The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for training. In consequence, any learning with an abundance of normal cases and a limited set of specific anomalous cases is biased towards the types of anomalies represented in the training set. Such models do not accurately identify anomalies, whether previously known or unknown, that may exist in future samples tested. Although one-class classifiers trained using only normal cases would avoid such a bias, robust sample characterization is critical for a generalizable model. Owing to sample heterogeneity and instrumental variability, arbitrary characterization of samples usually introduces feature noise that may lead to poor predictive performance. Herein, we present a non-parametric Bayesian algorithm called ASPIRE (anomalous sample phenotype identification with random effects) that identifies phenotypic differences across a batch of samples in the presence of random effects. Our approach involves simultaneous clustering of cellular measurements in individual samples and matching of discovered clusters across all samples in order to recover global clusters using probabilistic sampling techniques in a systematic way. RESULTS: We demonstrate the performance of the proposed method in identifying anomalous samples in two different FC data sets, one of which represents a set of samples including acute myeloid leukemia (AML) cases, and the other a generic 5-parameter peripheral-blood immunophenotyping. Results are evaluated in terms of the area under the receiver operating characteristics curve (AUC). ASPIRE achieved AUCs of 0.99 and 1.0 on the AML and generic blood immunophenotyping data sets, respectively. CONCLUSIONS: These results demonstrate that anomalous samples can be identified by ASPIRE with almost perfect accuracy without a priori access to samples of anomalous subtypes in the training set. The ASPIRE approach is unique in its ability to form generalizations regarding normal and anomalous states given only very weak assumptions regarding sample characteristics and origin. Thus, ASPIRE could become highly instrumental in providing unique insights about observed biological phenomena in the absence of full information about the investigated samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-314) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4262223
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42622232014-12-11 A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects Dundar, Murat Akova, Ferit Yerebakan, Halid Z Rajwa, Bartek BMC Bioinformatics Research Article BACKGROUND: Flow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems. The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for training. In consequence, any learning with an abundance of normal cases and a limited set of specific anomalous cases is biased towards the types of anomalies represented in the training set. Such models do not accurately identify anomalies, whether previously known or unknown, that may exist in future samples tested. Although one-class classifiers trained using only normal cases would avoid such a bias, robust sample characterization is critical for a generalizable model. Owing to sample heterogeneity and instrumental variability, arbitrary characterization of samples usually introduces feature noise that may lead to poor predictive performance. Herein, we present a non-parametric Bayesian algorithm called ASPIRE (anomalous sample phenotype identification with random effects) that identifies phenotypic differences across a batch of samples in the presence of random effects. Our approach involves simultaneous clustering of cellular measurements in individual samples and matching of discovered clusters across all samples in order to recover global clusters using probabilistic sampling techniques in a systematic way. RESULTS: We demonstrate the performance of the proposed method in identifying anomalous samples in two different FC data sets, one of which represents a set of samples including acute myeloid leukemia (AML) cases, and the other a generic 5-parameter peripheral-blood immunophenotyping. Results are evaluated in terms of the area under the receiver operating characteristics curve (AUC). ASPIRE achieved AUCs of 0.99 and 1.0 on the AML and generic blood immunophenotyping data sets, respectively. CONCLUSIONS: These results demonstrate that anomalous samples can be identified by ASPIRE with almost perfect accuracy without a priori access to samples of anomalous subtypes in the training set. The ASPIRE approach is unique in its ability to form generalizations regarding normal and anomalous states given only very weak assumptions regarding sample characteristics and origin. Thus, ASPIRE could become highly instrumental in providing unique insights about observed biological phenomena in the absence of full information about the investigated samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-314) contains supplementary material, which is available to authorized users. BioMed Central 2014-09-24 /pmc/articles/PMC4262223/ /pubmed/25248977 http://dx.doi.org/10.1186/1471-2105-15-314 Text en © Dundar et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Dundar, Murat Akova, Ferit Yerebakan, Halid Z Rajwa, Bartek A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
title	A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
title_full	A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
title_fullStr	A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
title_full_unstemmed	A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
title_short	A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
title_sort	non-parametric bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262223/ https://www.ncbi.nlm.nih.gov/pubmed/25248977 http://dx.doi.org/10.1186/1471-2105-15-314
work_keys_str_mv	AT dundarmurat anonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT akovaferit anonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT yerebakanhalidz anonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT rajwabartek anonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT dundarmurat nonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT akovaferit nonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT yerebakanhalidz nonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects AT rajwabartek nonparametricbayesianmodelforjointcellclusteringandclustermatchingidentificationofanomaloussamplephenotypeswithrandomeffects

A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects

Ejemplares similares