Cargando…

Homogeneity score test of AC(1) statistics and estimation of common AC(1) in multiple or stratified inter-rater agreement studies

BACKGROUND: Cohen’s κ coefficient is often used as an index to measure the agreement of inter-rater determinations. However, κ varies greatly depending on the marginal distribution of the target population and overestimates the probability of agreement occurring by chance. To overcome these limitati...

Descripción completa

Detalles Bibliográficos
Autores principales: Honda, Chikara, Ohyama, Tetsuji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7001312/
https://www.ncbi.nlm.nih.gov/pubmed/32020851
http://dx.doi.org/10.1186/s12874-019-0887-5
Descripción
Sumario:BACKGROUND: Cohen’s κ coefficient is often used as an index to measure the agreement of inter-rater determinations. However, κ varies greatly depending on the marginal distribution of the target population and overestimates the probability of agreement occurring by chance. To overcome these limitations, an alternative and more stable agreement coefficient was proposed, referred to as Gwet’s AC(1). When it is desired to combine results from multiple agreement studies, such as in a meta-analysis, or to perform stratified analysis with subject covariates that affect agreement, it is of interest to compare several agreement coefficients and present a common agreement index. A homogeneity test of κ was developed; however, there are no reports on homogeneity tests for AC(1) or on an estimator of common AC(1). In this article, a homogeneity score test for AC(1) is therefore derived, in the case of two raters with binary outcomes from K independent strata and its performance is investigated. An estimation of the common AC(1) between strata and its confidence intervals is also discussed. METHODS: Two homogeneity tests are provided: a score test and a goodness-of-fit test. In this study, the confidence intervals are derived by asymptotic, Fisher’s Z transformation and profile variance methods. Monte Carlo simulation studies were conducted to examine the validity of the proposed methods. An example using clinical data is also provided. RESULTS: Type I error rates of the proposed score test were close to the nominal level when conducting simulations with small and moderate sample sizes. The confidence intervals based on Fisher’s Z transformation and the profile variance method provided coverage levels close to nominal over a wide range of parameter combination. CONCLUSIONS: The method proposed in this study is considered to be useful for summarizing evaluations of consistency performed in multiple or stratified inter-rater agreement studies, for meta-analysis of reports from multiple groups and for stratified analysis.