Cargando…

Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment

BACKGROUND: Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential cause...

Descripción completa

Detalles Bibliográficos
Autores principales: Hope, David, Adamson, Karen, McManus, I. C., Chis, Liliana, Elder, Andrew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5883583/
https://www.ncbi.nlm.nih.gov/pubmed/29615016
http://dx.doi.org/10.1186/s12909-018-1143-0
_version_ 1783311679959859200
author Hope, David
Adamson, Karen
McManus, I. C.
Chis, Liliana
Elder, Andrew
author_facet Hope, David
Adamson, Karen
McManus, I. C.
Chis, Liliana
Elder, Andrew
author_sort Hope, David
collection PubMed
description BACKGROUND: Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. METHODS: We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. RESULTS: Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. CONCLUSIONS: DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12909-018-1143-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5883583
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58835832018-04-09 Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment Hope, David Adamson, Karen McManus, I. C. Chis, Liliana Elder, Andrew BMC Med Educ Research Article BACKGROUND: Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. METHODS: We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. RESULTS: Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. CONCLUSIONS: DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12909-018-1143-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-04-03 /pmc/articles/PMC5883583/ /pubmed/29615016 http://dx.doi.org/10.1186/s12909-018-1143-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Hope, David
Adamson, Karen
McManus, I. C.
Chis, Liliana
Elder, Andrew
Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
title Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
title_full Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
title_fullStr Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
title_full_unstemmed Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
title_short Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
title_sort using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5883583/
https://www.ncbi.nlm.nih.gov/pubmed/29615016
http://dx.doi.org/10.1186/s12909-018-1143-0
work_keys_str_mv AT hopedavid usingdifferentialitemfunctioningtoevaluatepotentialbiasinahighstakespostgraduateknowledgebasedassessment
AT adamsonkaren usingdifferentialitemfunctioningtoevaluatepotentialbiasinahighstakespostgraduateknowledgebasedassessment
AT mcmanusic usingdifferentialitemfunctioningtoevaluatepotentialbiasinahighstakespostgraduateknowledgebasedassessment
AT chisliliana usingdifferentialitemfunctioningtoevaluatepotentialbiasinahighstakespostgraduateknowledgebasedassessment
AT elderandrew usingdifferentialitemfunctioningtoevaluatepotentialbiasinahighstakespostgraduateknowledgebasedassessment