Cargando…

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approxim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Crook, Oliver M., Gatto, Laurent, Kirk, Paul D.W.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614016/ https://www.ncbi.nlm.nih.gov/pubmed/31829970 http://dx.doi.org/10.1515/sagmb-2018-0065

_version_	1783605553521491968
author	Crook, Oliver M. Gatto, Laurent Kirk, Paul D.W.
author_facet	Crook, Oliver M. Gatto, Laurent Kirk, Paul D.W.
author_sort	Crook, Oliver M.
collection	PubMed
description	The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel
format	Online Article Text
id	pubmed-7614016
institution	National Center for Biotechnology Information
language	English
publishDate	2019
record_format	MEDLINE/PubMed
spelling	pubmed-76140162023-01-03 Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics Crook, Oliver M. Gatto, Laurent Kirk, Paul D.W. Stat Appl Genet Mol Biol Article The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel 2019-12-12 2019-12-12 /pmc/articles/PMC7614016/ /pubmed/31829970 http://dx.doi.org/10.1515/sagmb-2018-0065 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under the Creative Commons Attribution 4.0 Public License https://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Crook, Oliver M. Gatto, Laurent Kirk, Paul D.W. Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
title	Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
title_full	Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
title_fullStr	Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
title_full_unstemmed	Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
title_short	Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
title_sort	fast approximate inference for variable selection in dirichlet process mixtures, with an application to pan-cancer proteomics
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614016/ https://www.ncbi.nlm.nih.gov/pubmed/31829970 http://dx.doi.org/10.1515/sagmb-2018-0065
work_keys_str_mv	AT crookoliverm fastapproximateinferenceforvariableselectionindirichletprocessmixtureswithanapplicationtopancancerproteomics AT gattolaurent fastapproximateinferenceforvariableselectionindirichletprocessmixtureswithanapplicationtopancancerproteomics AT kirkpauldw fastapproximateinferenceforvariableselectionindirichletprocessmixtureswithanapplicationtopancancerproteomics

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

Ejemplares similares