Cargando…

IRT scoring procedures for TIMSS data

In large-scale international assessment programs, results for mathematics proficiency are typically reported for jurisdictions such as provinces or countries. An overall score is provided along with subscores based on content subdomains defined in the test specifications. In this paper, an alternati...

Descripción completa

Detalles Bibliográficos
Autores principales: Camilli, Gregory, Dossey, John A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6603296/
https://www.ncbi.nlm.nih.gov/pubmed/31304098
http://dx.doi.org/10.1016/j.mex.2019.06.015
Descripción
Sumario:In large-scale international assessment programs, results for mathematics proficiency are typically reported for jurisdictions such as provinces or countries. An overall score is provided along with subscores based on content subdomains defined in the test specifications. In this paper, an alternative method for obtaining empirical subscores is described, where the empirical subscores are based on an exploratory item response theory (IRT) factor solution. This alternative scoring is intended to augment rather than to replace traditional scoring procedures. The IRT scoring method is applied to the mathematics achievement data from the Trends in International Mathematics and Science Study (TIMSS). A brief overview of the method is given, and additional material is given for validation of the empirical subscores. The ultimate goal of scoring is to provide diagnostic feedback in the form of naturally occurring item clustering. This provides useful information in addition to traditional subscores based on test specifications. As shown by Camilli and Dossey (2019), the achievement ranks of countries may change depending on which empirical subscore of mathematics is considered. Traditional subscores are highly correlated and tend to provide similar rank orders. • The methods takes advantage of the TIMSS sampling design, specifically pairs of jackknife zones, to aggregate categorical to higher-order sampling units for IRT factor analysis. • Once factor scores are estimated for sampling units and interpreted, they are aggregated to the jurisdiction level (countries, states, provinces) using sampling weights. The procedure for obtaining standard errors of jurisdictional level scores combines cross-sampling-unit variance and Monte Carlo sampling variation. • Full technical details of the IRT factoring procedures are given in Camilli and Fox (2015). Fox (2010) provides additional background for Bayesian item response modeling techniques. The estimation algorithm is based on stochastic approximation expectation-maximization (SAEM).