Cargando…

The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations

BACKGROUND: Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, S...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tighe, Jane, McManus, IC, Dewhurst, Neil G, Chis, Liliana, Mucklow, John
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2893515/ https://www.ncbi.nlm.nih.gov/pubmed/20525220 http://dx.doi.org/10.1186/1472-6920-10-40

_version_	1782183046341984256
author	Tighe, Jane McManus, IC Dewhurst, Neil G Chis, Liliana Mucklow, John
author_facet	Tighe, Jane McManus, IC Dewhurst, Neil G Chis, Liliana Mucklow, John
author_sort	Tighe, Jane
collection	PubMed
description	BACKGROUND: Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, Standard Error of Measurement (SEM) is mainly seen as useful only in determining the accuracy of a pass mark. However the alpha coefficient depends both on SEM and on the ability range (standard deviation, SD) of candidates taking an exam. This study investigated the extent to which the necessarily narrower ability range in candidates taking the second of the three part MRCP(UK) diploma examinations, biases assessment of reliability and SEM. METHODS: a) The interrelationships of standard deviation (SD), SEM and reliability were investigated in a Monte Carlo simulation of 10,000 candidates taking a postgraduate examination. b) Reliability and SEM were studied in the MRCP(UK) Part 1 and Part 2 Written Examinations from 2002 to 2008. c) Reliability and SEM were studied in eight Specialty Certificate Examinations introduced in 2008-9. RESULTS: The Monte Carlo simulation showed, as expected, that restricting the range of an assessment only to those who had already passed it, dramatically reduced the reliability but did not affect the SEM of a simulated assessment. The analysis of the MRCP(UK) Part 1 and Part 2 written examinations showed that the MRCP(UK) Part 2 written examination had a lower reliability than the Part 1 examination, but, despite that lower reliability, the Part 2 examination also had a smaller SEM (indicating a more accurate assessment). The Specialty Certificate Examinations had small Ns, and as a result, wide variability in their reliabilities, but SEMs were comparable with MRCP(UK) Part 2. CONCLUSIONS: An emphasis upon assessing the quality of assessments primarily in terms of reliability alone can produce a paradoxical and distorted picture, particularly in the situation where a narrower range of candidate ability is an inevitable consequence of being able to take a second part examination only after passing the first part examination. Reliability also shows problems when numbers of candidates in examinations are low and sampling error affects the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use.
format	Text
id	pubmed-2893515
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28935152010-06-30 The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations Tighe, Jane McManus, IC Dewhurst, Neil G Chis, Liliana Mucklow, John BMC Med Educ Research Article BACKGROUND: Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, Standard Error of Measurement (SEM) is mainly seen as useful only in determining the accuracy of a pass mark. However the alpha coefficient depends both on SEM and on the ability range (standard deviation, SD) of candidates taking an exam. This study investigated the extent to which the necessarily narrower ability range in candidates taking the second of the three part MRCP(UK) diploma examinations, biases assessment of reliability and SEM. METHODS: a) The interrelationships of standard deviation (SD), SEM and reliability were investigated in a Monte Carlo simulation of 10,000 candidates taking a postgraduate examination. b) Reliability and SEM were studied in the MRCP(UK) Part 1 and Part 2 Written Examinations from 2002 to 2008. c) Reliability and SEM were studied in eight Specialty Certificate Examinations introduced in 2008-9. RESULTS: The Monte Carlo simulation showed, as expected, that restricting the range of an assessment only to those who had already passed it, dramatically reduced the reliability but did not affect the SEM of a simulated assessment. The analysis of the MRCP(UK) Part 1 and Part 2 written examinations showed that the MRCP(UK) Part 2 written examination had a lower reliability than the Part 1 examination, but, despite that lower reliability, the Part 2 examination also had a smaller SEM (indicating a more accurate assessment). The Specialty Certificate Examinations had small Ns, and as a result, wide variability in their reliabilities, but SEMs were comparable with MRCP(UK) Part 2. CONCLUSIONS: An emphasis upon assessing the quality of assessments primarily in terms of reliability alone can produce a paradoxical and distorted picture, particularly in the situation where a narrower range of candidate ability is an inevitable consequence of being able to take a second part examination only after passing the first part examination. Reliability also shows problems when numbers of candidates in examinations are low and sampling error affects the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use. BioMed Central 2010-06-02 /pmc/articles/PMC2893515/ /pubmed/20525220 http://dx.doi.org/10.1186/1472-6920-10-40 Text en Copyright ©2010 Tighe et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Tighe, Jane McManus, IC Dewhurst, Neil G Chis, Liliana Mucklow, John The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title	The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_full	The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_fullStr	The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_full_unstemmed	The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_short	The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_sort	standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of mrcp(uk) examinations
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2893515/ https://www.ncbi.nlm.nih.gov/pubmed/20525220 http://dx.doi.org/10.1186/1472-6920-10-40
work_keys_str_mv	AT tighejane thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT mcmanusic thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT dewhurstneilg thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT chisliliana thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT mucklowjohn thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT tighejane standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT mcmanusic standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT dewhurstneilg standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT chisliliana standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations AT mucklowjohn standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations

The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations

Ejemplares similares