Cargando…

Bayesian clustering of multiple zero-inflated outcomes

Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multipl...

Descripción completa

Detalles Bibliográficos
Autores principales: Franzolini, Beatrice, Cremaschi, Andrea, van den Boom, Willem, De Iorio, Maria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041346/
https://www.ncbi.nlm.nih.gov/pubmed/36970823
http://dx.doi.org/10.1098/rsta.2022.0145
_version_ 1784912696292933632
author Franzolini, Beatrice
Cremaschi, Andrea
van den Boom, Willem
De Iorio, Maria
author_facet Franzolini, Beatrice
Cremaschi, Andrea
van den Boom, Willem
De Iorio, Maria
author_sort Franzolini, Beatrice
collection PubMed
description Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared with traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored Markov chain Monte Carlo schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp. This article is part of the theme issue ‘Bayesian inference: challenges, perspectives, and prospects’.
format Online
Article
Text
id pubmed-10041346
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-100413462023-03-28 Bayesian clustering of multiple zero-inflated outcomes Franzolini, Beatrice Cremaschi, Andrea van den Boom, Willem De Iorio, Maria Philos Trans A Math Phys Eng Sci Articles Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared with traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored Markov chain Monte Carlo schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp. This article is part of the theme issue ‘Bayesian inference: challenges, perspectives, and prospects’. The Royal Society 2023-05-15 2023-03-27 /pmc/articles/PMC10041346/ /pubmed/36970823 http://dx.doi.org/10.1098/rsta.2022.0145 Text en © 2023 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited.
spellingShingle Articles
Franzolini, Beatrice
Cremaschi, Andrea
van den Boom, Willem
De Iorio, Maria
Bayesian clustering of multiple zero-inflated outcomes
title Bayesian clustering of multiple zero-inflated outcomes
title_full Bayesian clustering of multiple zero-inflated outcomes
title_fullStr Bayesian clustering of multiple zero-inflated outcomes
title_full_unstemmed Bayesian clustering of multiple zero-inflated outcomes
title_short Bayesian clustering of multiple zero-inflated outcomes
title_sort bayesian clustering of multiple zero-inflated outcomes
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10041346/
https://www.ncbi.nlm.nih.gov/pubmed/36970823
http://dx.doi.org/10.1098/rsta.2022.0145
work_keys_str_mv AT franzolinibeatrice bayesianclusteringofmultiplezeroinflatedoutcomes
AT cremaschiandrea bayesianclusteringofmultiplezeroinflatedoutcomes
AT vandenboomwillem bayesianclusteringofmultiplezeroinflatedoutcomes
AT deioriomaria bayesianclusteringofmultiplezeroinflatedoutcomes