Cargando…
Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate?
BACKGROUND: Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss’ kappa (in the following labelled as Fleiss’ K) and Krippendorff’s alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4974794/ https://www.ncbi.nlm.nih.gov/pubmed/27495131 http://dx.doi.org/10.1186/s12874-016-0200-9 |
_version_ | 1782446612854865920 |
---|---|
author | Zapf, Antonia Castell, Stefanie Morawietz, Lars Karch, André |
author_facet | Zapf, Antonia Castell, Stefanie Morawietz, Lars Karch, André |
author_sort | Zapf, Antonia |
collection | PubMed |
description | BACKGROUND: Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss’ kappa (in the following labelled as Fleiss’ K) and Krippendorff’s alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations. METHODS: We performed a large simulation study to investigate the precision of the estimates for Fleiss’ K and Krippendorff’s alpha and to determine the empirical coverage probability of the corresponding confidence intervals (asymptotic for Fleiss’ K and bootstrap for both measures). Furthermore, we compared measures and confidence intervals in a real world case study. RESULTS: Point estimates of Fleiss’ K and Krippendorff’s alpha did not differ from each other in all scenarios. In the case of missing data (completely at random), Krippendorff’s alpha provided stable estimates, while the complete case analysis approach for Fleiss’ K led to biased estimates. For shifted null hypotheses, the coverage probability of the asymptotic confidence interval for Fleiss’ K was low, while the bootstrap confidence intervals for both measures provided a coverage probability close to the theoretical one. CONCLUSIONS: Fleiss’ K and Krippendorff’s alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data. The asymptotic confidence interval for Fleiss’ K should not be used. In the case of missing data or data or higher than nominal order, Krippendorff’s alpha is recommended. Together with this article, we provide an R-script for calculating Fleiss’ K and Krippendorff’s alpha and their corresponding bootstrap confidence intervals. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0200-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4974794 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49747942016-08-06 Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? Zapf, Antonia Castell, Stefanie Morawietz, Lars Karch, André BMC Med Res Methodol Research Article BACKGROUND: Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss’ kappa (in the following labelled as Fleiss’ K) and Krippendorff’s alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations. METHODS: We performed a large simulation study to investigate the precision of the estimates for Fleiss’ K and Krippendorff’s alpha and to determine the empirical coverage probability of the corresponding confidence intervals (asymptotic for Fleiss’ K and bootstrap for both measures). Furthermore, we compared measures and confidence intervals in a real world case study. RESULTS: Point estimates of Fleiss’ K and Krippendorff’s alpha did not differ from each other in all scenarios. In the case of missing data (completely at random), Krippendorff’s alpha provided stable estimates, while the complete case analysis approach for Fleiss’ K led to biased estimates. For shifted null hypotheses, the coverage probability of the asymptotic confidence interval for Fleiss’ K was low, while the bootstrap confidence intervals for both measures provided a coverage probability close to the theoretical one. CONCLUSIONS: Fleiss’ K and Krippendorff’s alpha with bootstrap confidence intervals are equally suitable for the analysis of reliability of complete nominal data. The asymptotic confidence interval for Fleiss’ K should not be used. In the case of missing data or data or higher than nominal order, Krippendorff’s alpha is recommended. Together with this article, we provide an R-script for calculating Fleiss’ K and Krippendorff’s alpha and their corresponding bootstrap confidence intervals. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0200-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-05 /pmc/articles/PMC4974794/ /pubmed/27495131 http://dx.doi.org/10.1186/s12874-016-0200-9 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Zapf, Antonia Castell, Stefanie Morawietz, Lars Karch, André Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
title | Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
title_full | Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
title_fullStr | Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
title_full_unstemmed | Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
title_short | Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
title_sort | measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4974794/ https://www.ncbi.nlm.nih.gov/pubmed/27495131 http://dx.doi.org/10.1186/s12874-016-0200-9 |
work_keys_str_mv | AT zapfantonia measuringinterraterreliabilityfornominaldatawhichcoefficientsandconfidenceintervalsareappropriate AT castellstefanie measuringinterraterreliabilityfornominaldatawhichcoefficientsandconfidenceintervalsareappropriate AT morawietzlars measuringinterraterreliabilityfornominaldatawhichcoefficientsandconfidenceintervalsareappropriate AT karchandre measuringinterraterreliabilityfornominaldatawhichcoefficientsandconfidenceintervalsareappropriate |