Cargando…
Measurement precision at the cut score in medical multiple choice exams: Theory matters
INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared cond...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Bohn Stafleu van Loghum
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459012/ https://www.ncbi.nlm.nih.gov/pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0 |
_version_ | 1783576301754384384 |
---|---|
author | Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören |
author_facet | Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören |
author_sort | Lahner, Felicitas-Maria |
collection | PubMed |
description | INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. METHODS: We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. RESULTS: In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. DISCUSSION: We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments. |
format | Online Article Text |
id | pubmed-7459012 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Bohn Stafleu van Loghum |
record_format | MEDLINE/PubMed |
spelling | pubmed-74590122020-09-15 Measurement precision at the cut score in medical multiple choice exams: Theory matters Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören Perspect Med Educ Original Article INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score. METHODS: We analyzed 32 multiple-choice exams from three Swiss medical schools comparing conditional reliability at the cut score in IRT and CCT. Additionally, we analyzed potential influencing factors such as the range of examinees’ performance, year of study, and number of items using multiple regression. RESULTS: In CTT, conditional reliability was highest for very low and very high scores, whereas examinees with medium scores showed low conditional reliabilities. In IRT, the maximum conditional reliability was in the middle of the scale. Therefore, conditional reliability at the cut score was significantly higher in IRT compared with CTT. It was influenced by the range of examinees’ performance and number of items. This influence was more pronounced in CTT. DISCUSSION: We found that conditional reliability shows inverse distributions and conclusions regarding the measurement precision at the cut score depending on the theory used. As the use of IRT seems to be more appropriate for criterion-oriented standard setting in the framework of competency-based medical education, our findings might have practical implications for the design and quality assurance of medical education assessments. Bohn Stafleu van Loghum 2020-05-28 2020-08 /pmc/articles/PMC7459012/ /pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Original Article Lahner, Felicitas-Maria Schauber, Stefan Lörwald, Andrea Carolin Kropf, Roger Guttormsen, Sissel Fischer, Martin R. Huwendiek, Sören Measurement precision at the cut score in medical multiple choice exams: Theory matters |
title | Measurement precision at the cut score in medical multiple choice exams: Theory matters |
title_full | Measurement precision at the cut score in medical multiple choice exams: Theory matters |
title_fullStr | Measurement precision at the cut score in medical multiple choice exams: Theory matters |
title_full_unstemmed | Measurement precision at the cut score in medical multiple choice exams: Theory matters |
title_short | Measurement precision at the cut score in medical multiple choice exams: Theory matters |
title_sort | measurement precision at the cut score in medical multiple choice exams: theory matters |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7459012/ https://www.ncbi.nlm.nih.gov/pubmed/32468274 http://dx.doi.org/10.1007/s40037-020-00586-0 |
work_keys_str_mv | AT lahnerfelicitasmaria measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT schauberstefan measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT lorwaldandreacarolin measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT kropfroger measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT guttormsensissel measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT fischermartinr measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters AT huwendieksoren measurementprecisionatthecutscoreinmedicalmultiplechoiceexamstheorymatters |