Cargando…

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when e...

Descripción completa

Detalles Bibliográficos
Autores principales: Kruppa, Jochen, Hothorn, Ludwig
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Taylor & Francis 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9042126/
https://www.ncbi.nlm.nih.gov/pubmed/35707260
http://dx.doi.org/10.1080/02664763.2020.1788518
_version_ 1784694604626395136
author Kruppa, Jochen
Hothorn, Ludwig
author_facet Kruppa, Jochen
Hothorn, Ludwig
author_sort Kruppa, Jochen
collection PubMed
description Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
format Online
Article
Text
id pubmed-9042126
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Taylor & Francis
record_format MEDLINE/PubMed
spelling pubmed-90421262022-06-14 A comparison study on modeling of clustered and overdispersed count data for multiple comparisons Kruppa, Jochen Hothorn, Ludwig J Appl Stat Review Article Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion. Taylor & Francis 2020-07-03 /pmc/articles/PMC9042126/ /pubmed/35707260 http://dx.doi.org/10.1080/02664763.2020.1788518 Text en © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
spellingShingle Review Article
Kruppa, Jochen
Hothorn, Ludwig
A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
title A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
title_full A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
title_fullStr A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
title_full_unstemmed A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
title_short A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
title_sort comparison study on modeling of clustered and overdispersed count data for multiple comparisons
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9042126/
https://www.ncbi.nlm.nih.gov/pubmed/35707260
http://dx.doi.org/10.1080/02664763.2020.1788518
work_keys_str_mv AT kruppajochen acomparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons
AT hothornludwig acomparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons
AT kruppajochen comparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons
AT hothornludwig comparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons