Cargando…

Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data

In item response theory (IRT) models, assessing model-data fit is an essential step in IRT calibration. While no general agreement has ever been reached on the best methods or approaches to use for detecting misfit, perhaps the more important comment based upon the research findings is that rarely d...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Yue, Hambleton, Ronald K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5378703/
https://www.ncbi.nlm.nih.gov/pubmed/28421011
http://dx.doi.org/10.3389/fpsyg.2017.00484
_version_ 1782519460592091136
author Zhao, Yue
Hambleton, Ronald K.
author_facet Zhao, Yue
Hambleton, Ronald K.
author_sort Zhao, Yue
collection PubMed
description In item response theory (IRT) models, assessing model-data fit is an essential step in IRT calibration. While no general agreement has ever been reached on the best methods or approaches to use for detecting misfit, perhaps the more important comment based upon the research findings is that rarely does the research evaluate IRT misfit by focusing on the practical consequences of misfit. The study investigated the practical consequences of IRT model misfit in examining the equating performance and the classification of examinees into performance categories in a simulation study that mimics a typical large-scale statewide assessment program with mixed-format test data. The simulation study was implemented by varying three factors, including choice of IRT model, amount of growth/change of examinees’ abilities between two adjacent administration years, and choice of IRT scaling methods. Findings indicated that the extent of significant consequences of model misfit varied over the choice of model and IRT scaling methods. In comparison with mean/sigma (MS) and Stocking and Lord characteristic curve (SL) methods, separate calibration with linking and fixed common item parameter (FCIP) procedure was more sensitive to model misfit and more robust against various amounts of ability shifts between two adjacent administrations regardless of model fit. SL was generally the least sensitive to model misfit in recovering equating conversion and MS was the least robust against ability shifts in recovering the equating conversion when a substantial degree of misfit was present. The key messages from the study are that practical ways are available to study model fit, and, model fit or misfit can have consequences that should be considered when choosing an IRT model. Not only does the study address the consequences of IRT model misfit, but also it is our hope to help researchers and practitioners find practical ways to study model fit and to investigate the validity of particular IRT models for achieving a specified purpose, to assure that the successful use of the IRT models are realized, and to improve the applications of IRT models with educational and psychological test data.
format Online
Article
Text
id pubmed-5378703
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-53787032017-04-18 Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data Zhao, Yue Hambleton, Ronald K. Front Psychol Psychology In item response theory (IRT) models, assessing model-data fit is an essential step in IRT calibration. While no general agreement has ever been reached on the best methods or approaches to use for detecting misfit, perhaps the more important comment based upon the research findings is that rarely does the research evaluate IRT misfit by focusing on the practical consequences of misfit. The study investigated the practical consequences of IRT model misfit in examining the equating performance and the classification of examinees into performance categories in a simulation study that mimics a typical large-scale statewide assessment program with mixed-format test data. The simulation study was implemented by varying three factors, including choice of IRT model, amount of growth/change of examinees’ abilities between two adjacent administration years, and choice of IRT scaling methods. Findings indicated that the extent of significant consequences of model misfit varied over the choice of model and IRT scaling methods. In comparison with mean/sigma (MS) and Stocking and Lord characteristic curve (SL) methods, separate calibration with linking and fixed common item parameter (FCIP) procedure was more sensitive to model misfit and more robust against various amounts of ability shifts between two adjacent administrations regardless of model fit. SL was generally the least sensitive to model misfit in recovering equating conversion and MS was the least robust against ability shifts in recovering the equating conversion when a substantial degree of misfit was present. The key messages from the study are that practical ways are available to study model fit, and, model fit or misfit can have consequences that should be considered when choosing an IRT model. Not only does the study address the consequences of IRT model misfit, but also it is our hope to help researchers and practitioners find practical ways to study model fit and to investigate the validity of particular IRT models for achieving a specified purpose, to assure that the successful use of the IRT models are realized, and to improve the applications of IRT models with educational and psychological test data. Frontiers Media S.A. 2017-04-04 /pmc/articles/PMC5378703/ /pubmed/28421011 http://dx.doi.org/10.3389/fpsyg.2017.00484 Text en Copyright © 2017 Zhao and Hambleton. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Zhao, Yue
Hambleton, Ronald K.
Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data
title Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data
title_full Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data
title_fullStr Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data
title_full_unstemmed Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data
title_short Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data
title_sort practical consequences of item response theory model misfit in the context of test equating with mixed-format test data
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5378703/
https://www.ncbi.nlm.nih.gov/pubmed/28421011
http://dx.doi.org/10.3389/fpsyg.2017.00484
work_keys_str_mv AT zhaoyue practicalconsequencesofitemresponsetheorymodelmisfitinthecontextoftestequatingwithmixedformattestdata
AT hambletonronaldk practicalconsequencesofitemresponsetheorymodelmisfitinthecontextoftestequatingwithmixedformattestdata