Cargando…

Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes

Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of di...

Descripción completa

Detalles Bibliográficos
Autor principal:	Homer, Matt
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117341/ https://www.ncbi.nlm.nih.gov/pubmed/35230590 http://dx.doi.org/10.1007/s10459-022-10096-9

_version_	1784710312749957120
author	Homer, Matt
author_facet	Homer, Matt
author_sort	Homer, Matt
collection	PubMed
description	Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade. The exam data is from 442 separate administrations of an 18 station OSCE for international medical graduates who want to work in the National Health Service in the UK. We find that variation due to examiner is approximately twice as large for domain scores as it is for grades (16% vs. 8%), with smaller residual variance in the former (67% vs. 76%). Combined estimates of exam-level (relative) reliability across all data are 0.75 and 0.69 for domains scores and grades respectively. The correlation between two separate estimates of stringency for individual examiners (one for grades and one for domain scores) is relatively high (r=0.76) implying that examiners are generally quite consistent in their stringency between these two assessments of performance. Cluster analysis indicates that examiners fall into two broad groups characterised as hawks or doves on both measures. At the exam level, correcting for examiner stringency produces systematically lower cut-scores under borderline regression standard setting than using the raw marks. In turn, such a correction would produce higher pass rates—although meaningful direct comparisons are challenging to make. As in other studies, this work shows that OSCEs and other standardised performance assessments are subject to substantial variation in examiner stringency, and require sufficient domain sampling to ensure quality of pass/fail decision-making is at least adequate. More, perhaps qualitative, work is needed to understand better how examiners might score similarly (or differently) between the awarding of station-level domain scores and global grades. The issue of the potential systematic bias of borderline regression evidenced for the first time here, with sources of error producing cut-scores higher than they should be, also needs more investigation.
format	Online Article Text
id	pubmed-9117341
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-91173412022-05-20 Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes Homer, Matt Adv Health Sci Educ Theory Pract Article Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade. The exam data is from 442 separate administrations of an 18 station OSCE for international medical graduates who want to work in the National Health Service in the UK. We find that variation due to examiner is approximately twice as large for domain scores as it is for grades (16% vs. 8%), with smaller residual variance in the former (67% vs. 76%). Combined estimates of exam-level (relative) reliability across all data are 0.75 and 0.69 for domains scores and grades respectively. The correlation between two separate estimates of stringency for individual examiners (one for grades and one for domain scores) is relatively high (r=0.76) implying that examiners are generally quite consistent in their stringency between these two assessments of performance. Cluster analysis indicates that examiners fall into two broad groups characterised as hawks or doves on both measures. At the exam level, correcting for examiner stringency produces systematically lower cut-scores under borderline regression standard setting than using the raw marks. In turn, such a correction would produce higher pass rates—although meaningful direct comparisons are challenging to make. As in other studies, this work shows that OSCEs and other standardised performance assessments are subject to substantial variation in examiner stringency, and require sufficient domain sampling to ensure quality of pass/fail decision-making is at least adequate. More, perhaps qualitative, work is needed to understand better how examiners might score similarly (or differently) between the awarding of station-level domain scores and global grades. The issue of the potential systematic bias of borderline regression evidenced for the first time here, with sources of error producing cut-scores higher than they should be, also needs more investigation. Springer Netherlands 2022-03-01 2022 /pmc/articles/PMC9117341/ /pubmed/35230590 http://dx.doi.org/10.1007/s10459-022-10096-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Homer, Matt Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes
title	Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes
title_full	Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes
title_fullStr	Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes
title_full_unstemmed	Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes
title_short	Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes
title_sort	pass/fail decisions and standards: the impact of differential examiner stringency on osce outcomes
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117341/ https://www.ncbi.nlm.nih.gov/pubmed/35230590 http://dx.doi.org/10.1007/s10459-022-10096-9
work_keys_str_mv	AT homermatt passfaildecisionsandstandardstheimpactofdifferentialexaminerstringencyonosceoutcomes

Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes

Ejemplares similares