Cargando…
A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when e...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Taylor & Francis
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9042126/ https://www.ncbi.nlm.nih.gov/pubmed/35707260 http://dx.doi.org/10.1080/02664763.2020.1788518 |
_version_ | 1784694604626395136 |
---|---|
author | Kruppa, Jochen Hothorn, Ludwig |
author_facet | Kruppa, Jochen Hothorn, Ludwig |
author_sort | Kruppa, Jochen |
collection | PubMed |
description | Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion. |
format | Online Article Text |
id | pubmed-9042126 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Taylor & Francis |
record_format | MEDLINE/PubMed |
spelling | pubmed-90421262022-06-14 A comparison study on modeling of clustered and overdispersed count data for multiple comparisons Kruppa, Jochen Hothorn, Ludwig J Appl Stat Review Article Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion. Taylor & Francis 2020-07-03 /pmc/articles/PMC9042126/ /pubmed/35707260 http://dx.doi.org/10.1080/02664763.2020.1788518 Text en © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. |
spellingShingle | Review Article Kruppa, Jochen Hothorn, Ludwig A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
title | A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
title_full | A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
title_fullStr | A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
title_full_unstemmed | A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
title_short | A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
title_sort | comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
topic | Review Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9042126/ https://www.ncbi.nlm.nih.gov/pubmed/35707260 http://dx.doi.org/10.1080/02664763.2020.1788518 |
work_keys_str_mv | AT kruppajochen acomparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons AT hothornludwig acomparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons AT kruppajochen comparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons AT hothornludwig comparisonstudyonmodelingofclusteredandoverdispersedcountdataformultiplecomparisons |