Cargando…

Measurement precision at the cut score in medical multiple choice exams: Theory matters

INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared cond...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lahner, Felicitas-Maria, Schauber, Stefan, Lörwald, Andrea Carolin, Kropf, Roger, Guttormsen, Sissel, Fischer, Martin R., Huwendiek, Sören
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Bohn Stafleu van Loghum 2020
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459012/ https://www.ncbi.nlm.nih.gov/pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0

_version_	1783576301754384384
author	Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören
author_facet	Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören
author_sort	Lahner, Felicitas-Maria
collection	PubMed
description	INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. METHODS: We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. RESULTS: In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. DISCUSSION: We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments.
format	Online Article Text
id	pubmed-7459012
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Bohn Stafleu van Loghum
record_format	MEDLINE/PubMed
spelling	pubmed-74590122020-09-15 Measurement precision at the cut score in medical multiple choice exams: Theory matters Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören Perspect Med Educ Original Article INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. METHODS: We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. RESULTS: In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. DISCUSSION: We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments. Bohn Stafleu van Loghum 2020-05-28 2020-08 /pmc/articles/PMC7459012/ /pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Original Article Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören Measurement precision at the cut score in medical multiple choice exams: Theory matters
title	Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_full	Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_fullStr	Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_full_unstemmed	Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_short	Measurement precision at the cut score in medical multiple choice exams: Theory matters
title_sort	measurement precision at the cut score in medical multiple choice exams: theory matters
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459012/ https://www.ncbi.nlm.nih.gov/pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0
work_keys_str_mv	AT lahnerfelicitasmaria measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT schauberstefan measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT lorwaldandreacarolin measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT kropfroger measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT guttormsensissel measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT fischermartinr measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT huwendieksoren measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters

Measurement precision at the cut score in medical multiple choice exams: Theory matters

Ejemplares similares