Cargando…

Unsupervised Bayesian linear unmixing of gene expression microarrays

BACKGROUND: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples...

Descripción completa

Detalles Bibliográficos
Autores principales: Bazot, Cécile, Dobigeon, Nicolas, Tourneret, Jean-Yves, Zaas, Aimee K, Ginsburg, Geoffrey S, O Hero III, Alfred
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681645/
https://www.ncbi.nlm.nih.gov/pubmed/23506672
http://dx.doi.org/10.1186/1471-2105-14-99
_version_ 1782273290268573696
author Bazot, Cécile
Dobigeon, Nicolas
Tourneret, Jean-Yves
Zaas, Aimee K
Ginsburg, Geoffrey S
O Hero III, Alfred
author_facet Bazot, Cécile
Dobigeon, Nicolas
Tourneret, Jean-Yves
Zaas, Aimee K
Ginsburg, Geoffrey S
O Hero III, Alfred
author_sort Bazot, Cécile
collection PubMed
description BACKGROUND: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. RESULTS: Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. CONCLUSIONS: The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.
format Online
Article
Text
id pubmed-3681645
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36816452013-06-25 Unsupervised Bayesian linear unmixing of gene expression microarrays Bazot, Cécile Dobigeon, Nicolas Tourneret, Jean-Yves Zaas, Aimee K Ginsburg, Geoffrey S O Hero III, Alfred BMC Bioinformatics Research Article BACKGROUND: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. RESULTS: Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. CONCLUSIONS: The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor. BioMed Central 2013-03-19 /pmc/articles/PMC3681645/ /pubmed/23506672 http://dx.doi.org/10.1186/1471-2105-14-99 Text en Copyright © 2013 Bazot et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bazot, Cécile
Dobigeon, Nicolas
Tourneret, Jean-Yves
Zaas, Aimee K
Ginsburg, Geoffrey S
O Hero III, Alfred
Unsupervised Bayesian linear unmixing of gene expression microarrays
title Unsupervised Bayesian linear unmixing of gene expression microarrays
title_full Unsupervised Bayesian linear unmixing of gene expression microarrays
title_fullStr Unsupervised Bayesian linear unmixing of gene expression microarrays
title_full_unstemmed Unsupervised Bayesian linear unmixing of gene expression microarrays
title_short Unsupervised Bayesian linear unmixing of gene expression microarrays
title_sort unsupervised bayesian linear unmixing of gene expression microarrays
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681645/
https://www.ncbi.nlm.nih.gov/pubmed/23506672
http://dx.doi.org/10.1186/1471-2105-14-99
work_keys_str_mv AT bazotcecile unsupervisedbayesianlinearunmixingofgeneexpressionmicroarrays
AT dobigeonnicolas unsupervisedbayesianlinearunmixingofgeneexpressionmicroarrays
AT tourneretjeanyves unsupervisedbayesianlinearunmixingofgeneexpressionmicroarrays
AT zaasaimeek unsupervisedbayesianlinearunmixingofgeneexpressionmicroarrays
AT ginsburggeoffreys unsupervisedbayesianlinearunmixingofgeneexpressionmicroarrays
AT oheroiiialfred unsupervisedbayesianlinearunmixingofgeneexpressionmicroarrays