Cargando…

A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples

BACKGROUND: Rater agreement is important in clinical research, and Cohen’s Kappa is a widely used method for assessing inter-rater reliability; however, there are well documented statistical problems associated with the measure. In order to assess its utility, we evaluated it against Gwet’s AC1 and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wongpakaran, Nahathai, Wongpakaran, Tinakon, Wedding, Danny, Gwet, Kilem L
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3643869/ https://www.ncbi.nlm.nih.gov/pubmed/23627889 http://dx.doi.org/10.1186/1471-2288-13-61

_version_	1782268386940551168
author	Wongpakaran, Nahathai Wongpakaran, Tinakon Wedding, Danny Gwet, Kilem L
author_facet	Wongpakaran, Nahathai Wongpakaran, Tinakon Wedding, Danny Gwet, Kilem L
author_sort	Wongpakaran, Nahathai
collection	PubMed
description	BACKGROUND: Rater agreement is important in clinical research, and Cohen’s Kappa is a widely used method for assessing inter-rater reliability; however, there are well documented statistical problems associated with the measure. In order to assess its utility, we evaluated it against Gwet’s AC1 and compared the results. METHODS: This study was carried out across 67 patients (56% males) aged 18 to 67, with a mean SD of 44.13 ± 12.68 years. Nine raters (7 psychiatrists, a psychiatry resident and a social worker) participated as interviewers, either for the first or the second interviews, which were held 4 to 6 weeks apart. The interviews were held in order to establish a personality disorder (PD) diagnosis using DSM-IV criteria. Cohen’s Kappa and Gwet’s AC1 were used and the level of agreement between raters was assessed in terms of a simple categorical diagnosis (i.e., the presence or absence of a disorder). Data were also compared with a previous analysis in order to evaluate the effects of trait prevalence. RESULTS: Gwet’s AC1 was shown to have higher inter-rater reliability coefficients for all the PD criteria, ranging from .752 to 1.000, whereas Cohen’s Kappa ranged from 0 to 1.00. Cohen’s Kappa values were high and close to the percentage of agreement when the prevalence was high, whereas Gwet’s AC1 values appeared not to change much with a change in prevalence, but remained close to the percentage of agreement. For example a Schizoid sample revealed a mean Cohen’s Kappa of .726 and a Gwet’s AC1of .853 , which fell within the different level of agreement according to criteria developed by Landis and Koch, and Altman and Fleiss. CONCLUSIONS: Based on the different formulae used to calculate the level of chance-corrected agreement, Gwet’s AC1 was shown to provide a more stable inter-rater reliability coefficient than Cohen’s Kappa. It was also found to be less affected by prevalence and marginal probability than that of Cohen’s Kappa, and therefore should be considered for use with inter-rater reliability analysis.
format	Online Article Text
id	pubmed-3643869
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36438692013-05-09 A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples Wongpakaran, Nahathai Wongpakaran, Tinakon Wedding, Danny Gwet, Kilem L BMC Med Res Methodol Research Article BACKGROUND: Rater agreement is important in clinical research, and Cohen’s Kappa is a widely used method for assessing inter-rater reliability; however, there are well documented statistical problems associated with the measure. In order to assess its utility, we evaluated it against Gwet’s AC1 and compared the results. METHODS: This study was carried out across 67 patients (56% males) aged 18 to 67, with a mean SD of 44.13 ± 12.68 years. Nine raters (7 psychiatrists, a psychiatry resident and a social worker) participated as interviewers, either for the first or the second interviews, which were held 4 to 6 weeks apart. The interviews were held in order to establish a personality disorder (PD) diagnosis using DSM-IV criteria. Cohen’s Kappa and Gwet’s AC1 were used and the level of agreement between raters was assessed in terms of a simple categorical diagnosis (i.e., the presence or absence of a disorder). Data were also compared with a previous analysis in order to evaluate the effects of trait prevalence. RESULTS: Gwet’s AC1 was shown to have higher inter-rater reliability coefficients for all the PD criteria, ranging from .752 to 1.000, whereas Cohen’s Kappa ranged from 0 to 1.00. Cohen’s Kappa values were high and close to the percentage of agreement when the prevalence was high, whereas Gwet’s AC1 values appeared not to change much with a change in prevalence, but remained close to the percentage of agreement. For example a Schizoid sample revealed a mean Cohen’s Kappa of .726 and a Gwet’s AC1of .853 , which fell within the different level of agreement according to criteria developed by Landis and Koch, and Altman and Fleiss. CONCLUSIONS: Based on the different formulae used to calculate the level of chance-corrected agreement, Gwet’s AC1 was shown to provide a more stable inter-rater reliability coefficient than Cohen’s Kappa. It was also found to be less affected by prevalence and marginal probability than that of Cohen’s Kappa, and therefore should be considered for use with inter-rater reliability analysis. BioMed Central 2013-04-29 /pmc/articles/PMC3643869/ /pubmed/23627889 http://dx.doi.org/10.1186/1471-2288-13-61 Text en Copyright © 2013 Wongpakaran et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Wongpakaran, Nahathai Wongpakaran, Tinakon Wedding, Danny Gwet, Kilem L A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
title	A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
title_full	A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
title_fullStr	A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
title_full_unstemmed	A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
title_short	A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
title_sort	comparison of cohen’s kappa and gwet’s ac1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3643869/ https://www.ncbi.nlm.nih.gov/pubmed/23627889 http://dx.doi.org/10.1186/1471-2288-13-61
work_keys_str_mv	AT wongpakarannahathai acomparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT wongpakarantinakon acomparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT weddingdanny acomparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT gwetkileml acomparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT wongpakarannahathai comparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT wongpakarantinakon comparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT weddingdanny comparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples AT gwetkileml comparisonofcohenskappaandgwetsac1whencalculatinginterraterreliabilitycoefficientsastudyconductedwithpersonalitydisordersamples

A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples

Ejemplares similares