Cargando…

Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression

Background: Dependent variables in health psychology are often counts, for example, of a behaviour or number of engagements with an intervention. These counts can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, ‘How many cigarettes do you smok...

Descripción completa

Detalles Bibliográficos
Autor principal: Green, James A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Routledge 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8159206/
https://www.ncbi.nlm.nih.gov/pubmed/34104569
http://dx.doi.org/10.1080/21642850.2021.1920416
_version_ 1783700031602163712
author Green, James A.
author_facet Green, James A.
author_sort Green, James A.
collection PubMed
description Background: Dependent variables in health psychology are often counts, for example, of a behaviour or number of engagements with an intervention. These counts can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, ‘How many cigarettes do you smoke on an average day?’ The modal answer may be zero but may range from 0 to 40+. The same can be true for minutes of moderate-to-vigorous physical activity. For some people, this may be near zero, but take on extreme values for someone training for a marathon. Typical analytical strategies for this data involve explicit (or implied) transformations (smoker v. non-smoker, log transformations). However, these data types are ‘counts’ (i.e. non-negative whole numbers) or quasi-counts (time is ratio but discrete minutes of activity could be analysed as a count), and can be modelled using count distributions – including the Poisson and negative binomial distribution (and their zero-inflated and hurdle extensions, which alloweven more zeros). Methods: In this tutorial paper I demonstrate (in R, Jamovi, and SPSS) the easy application of these models to health psychology data, and their advantages over alternative ways of analysing this type of data using two datasets – one highly dispersed dependent variable (number of views on YouTube, and another with a large number of zeros (number of days on which symptoms were reported over a month). Results: The negative binomial distribution had the best fit for the overdispersed number of views on YouTube. Negative binomial, and zero-inflated negative binomial were both good fits for the symptom data with over-abundant zeros. Conclusions: In both cases, count distributions provided not just a better fit but would lead to different conclusions compared to the poorly fitting traditional regression/linear models.
format Online
Article
Text
id pubmed-8159206
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Routledge
record_format MEDLINE/PubMed
spelling pubmed-81592062021-06-07 Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression Green, James A. Health Psychol Behav Med Advanced Methods in Health Psychology and Behavioral Medicine Background: Dependent variables in health psychology are often counts, for example, of a behaviour or number of engagements with an intervention. These counts can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, ‘How many cigarettes do you smoke on an average day?’ The modal answer may be zero but may range from 0 to 40+. The same can be true for minutes of moderate-to-vigorous physical activity. For some people, this may be near zero, but take on extreme values for someone training for a marathon. Typical analytical strategies for this data involve explicit (or implied) transformations (smoker v. non-smoker, log transformations). However, these data types are ‘counts’ (i.e. non-negative whole numbers) or quasi-counts (time is ratio but discrete minutes of activity could be analysed as a count), and can be modelled using count distributions – including the Poisson and negative binomial distribution (and their zero-inflated and hurdle extensions, which alloweven more zeros). Methods: In this tutorial paper I demonstrate (in R, Jamovi, and SPSS) the easy application of these models to health psychology data, and their advantages over alternative ways of analysing this type of data using two datasets – one highly dispersed dependent variable (number of views on YouTube, and another with a large number of zeros (number of days on which symptoms were reported over a month). Results: The negative binomial distribution had the best fit for the overdispersed number of views on YouTube. Negative binomial, and zero-inflated negative binomial were both good fits for the symptom data with over-abundant zeros. Conclusions: In both cases, count distributions provided not just a better fit but would lead to different conclusions compared to the poorly fitting traditional regression/linear models. Routledge 2021-05-06 /pmc/articles/PMC8159206/ /pubmed/34104569 http://dx.doi.org/10.1080/21642850.2021.1920416 Text en © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Advanced Methods in Health Psychology and Behavioral Medicine
Green, James A.
Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression
title Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression
title_full Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression
title_fullStr Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression
title_full_unstemmed Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression
title_short Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression
title_sort too many zeros and/or highly skewed? a tutorial on modelling health behaviour as count data with poisson and negative binomial regression
topic Advanced Methods in Health Psychology and Behavioral Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8159206/
https://www.ncbi.nlm.nih.gov/pubmed/34104569
http://dx.doi.org/10.1080/21642850.2021.1920416
work_keys_str_mv AT greenjamesa toomanyzerosandorhighlyskewedatutorialonmodellinghealthbehaviourascountdatawithpoissonandnegativebinomialregression