Cargando…

Measurement precision at the cut score in medical multiple choice exams: Theory matters

INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared cond...

Descripción completa

Detalles Bibliográficos
Autores principales: Lahner, Felicitas-Maria, Schauber, Stefan, Lörwald, Andrea Carolin, Kropf, Roger, Guttormsen, Sissel, Fischer, Martin R., Huwendiek, Sören
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Bohn Stafleu van Loghum 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459012/
https://www.ncbi.nlm.nih.gov/pubmed/32468274
http://dx.doi.org/10.1007/s40037-020-00586-0
_version_ 1783576301754384384
author Lahner, Felicitas-Maria
Schauber, Stefan
Lörwald, Andrea Carolin
Kropf, Roger
Guttormsen, Sissel
Fischer, Martin R.
Huwendiek, Sören
author_facet Lahner, Felicitas-Maria
Schauber, Stefan
Lörwald, Andrea Carolin
Kropf, Roger
Guttormsen, Sissel
Fischer, Martin R.
Huwendiek, Sören
author_sort Lahner, Felicitas-Maria
collection PubMed
description INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. METHODS: We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. RESULTS: In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. DISCUSSION: We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments.
format Online
Article
Text
id pubmed-7459012
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Bohn Stafleu van Loghum
record_format MEDLINE/PubMed
spelling pubmed-74590122020-09-15 Measurement precision at the cut score in medical multiple choice exams: Theory matters Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören Perspect Med Educ Original Article INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. METHODS: We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. RESULTS: In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. DISCUSSION: We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments. Bohn Stafleu van Loghum 2020-05-28 2020-08 /pmc/articles/PMC7459012/ /pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Original Article
Lahner, Felicitas-Maria
Schauber, Stefan
Lörwald, Andrea Carolin
Kropf, Roger
Guttormsen, Sissel
Fischer, Martin R.
Huwendiek, Sören
Measurement precision at the cut score in medical multiple choice exams: Theory matters
title Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_full Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_fullStr Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_full_unstemmed Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_short Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_sort measurement precision at the cut score in medical multiple choice exams: theory matters
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459012/
https://www.ncbi.nlm.nih.gov/pubmed/32468274
http://dx.doi.org/10.1007/s40037-020-00586-0
work_keys_str_mv AT lahnerfelicitasmaria measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters
AT schauberstefan measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters
AT lorwaldandreacarolin measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters
AT kropfroger measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters
AT guttormsensissel measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters
AT fischermartinr measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters
AT huwendieksoren measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters