Cargando…

State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory

BACKGROUND: This paper is part of a series comparing different psychometric approaches to evaluate patient-reported outcome (PRO) measures using the same items and dataset. We provide an overview and example application to demonstrate 1) using item response theory (IRT) to identify poor and well per...

Descripción completa

Detalles Bibliográficos
Autores principales: Stover, Angela M., McLeod, Lori D., Langer, Michelle M., Chen, Wen-Hung, Reeve, Bryce B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6663947/
https://www.ncbi.nlm.nih.gov/pubmed/31359210
http://dx.doi.org/10.1186/s41687-019-0130-5
_version_ 1783439803246706688
author Stover, Angela M.
McLeod, Lori D.
Langer, Michelle M.
Chen, Wen-Hung
Reeve, Bryce B.
author_facet Stover, Angela M.
McLeod, Lori D.
Langer, Michelle M.
Chen, Wen-Hung
Reeve, Bryce B.
author_sort Stover, Angela M.
collection PubMed
description BACKGROUND: This paper is part of a series comparing different psychometric approaches to evaluate patient-reported outcome (PRO) measures using the same items and dataset. We provide an overview and example application to demonstrate 1) using item response theory (IRT) to identify poor and well performing items; 2) testing if items perform differently based on demographic characteristics (differential item functioning, DIF); and 3) balancing IRT and content validity considerations to select items for short forms. METHODS: Model fit, local dependence, and DIF were examined for 51 items initially considered for the Patient-Reported Outcomes Measurement Information System® (PROMIS®) Depression item bank. Samejima’s graded response model was used to examine how well each item measured severity levels of depression and how well it distinguished between individuals with high and low levels of depression. Two short forms were constructed based on psychometric properties and consensus discussions with instrument developers, including psychometricians and content experts. Calibrations presented here are for didactic purposes and are not intended to replace official PROMIS parameters or to be used for research. RESULTS: Of the 51 depression items, 14 exhibited local dependence, 3 exhibited DIF for gender, and 9 exhibited misfit, and these items were removed from consideration for short forms. Short form 1 prioritized content, and thus items were chosen to meet DSM-V criteria rather than being discarded for lower discrimination parameters. Short form 2 prioritized well performing items, and thus fewer DSM-V criteria were satisfied. Short forms 1–2 performed similarly for model fit statistics, but short form 2 provided greater item precision. CONCLUSIONS: IRT is a family of flexible models providing item- and scale-level information, making it a powerful tool for scale construction and refinement. Strengths of IRT models include placing respondents and items on the same metric, testing DIF across demographic or clinical subgroups, and facilitating creation of targeted short forms. Limitations include large sample sizes to obtain stable item parameters, and necessary familiarity with measurement methods to interpret results. Combining psychometric data with stakeholder input (including people with lived experiences of the health condition and clinicians) is highly recommended for scale development and evaluation.
format Online
Article
Text
id pubmed-6663947
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-66639472019-08-12 State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory Stover, Angela M. McLeod, Lori D. Langer, Michelle M. Chen, Wen-Hung Reeve, Bryce B. J Patient Rep Outcomes Research BACKGROUND: This paper is part of a series comparing different psychometric approaches to evaluate patient-reported outcome (PRO) measures using the same items and dataset. We provide an overview and example application to demonstrate 1) using item response theory (IRT) to identify poor and well performing items; 2) testing if items perform differently based on demographic characteristics (differential item functioning, DIF); and 3) balancing IRT and content validity considerations to select items for short forms. METHODS: Model fit, local dependence, and DIF were examined for 51 items initially considered for the Patient-Reported Outcomes Measurement Information System® (PROMIS®) Depression item bank. Samejima’s graded response model was used to examine how well each item measured severity levels of depression and how well it distinguished between individuals with high and low levels of depression. Two short forms were constructed based on psychometric properties and consensus discussions with instrument developers, including psychometricians and content experts. Calibrations presented here are for didactic purposes and are not intended to replace official PROMIS parameters or to be used for research. RESULTS: Of the 51 depression items, 14 exhibited local dependence, 3 exhibited DIF for gender, and 9 exhibited misfit, and these items were removed from consideration for short forms. Short form 1 prioritized content, and thus items were chosen to meet DSM-V criteria rather than being discarded for lower discrimination parameters. Short form 2 prioritized well performing items, and thus fewer DSM-V criteria were satisfied. Short forms 1–2 performed similarly for model fit statistics, but short form 2 provided greater item precision. CONCLUSIONS: IRT is a family of flexible models providing item- and scale-level information, making it a powerful tool for scale construction and refinement. Strengths of IRT models include placing respondents and items on the same metric, testing DIF across demographic or clinical subgroups, and facilitating creation of targeted short forms. Limitations include large sample sizes to obtain stable item parameters, and necessary familiarity with measurement methods to interpret results. Combining psychometric data with stakeholder input (including people with lived experiences of the health condition and clinicians) is highly recommended for scale development and evaluation. Springer International Publishing 2019-07-30 /pmc/articles/PMC6663947/ /pubmed/31359210 http://dx.doi.org/10.1186/s41687-019-0130-5 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Stover, Angela M.
McLeod, Lori D.
Langer, Michelle M.
Chen, Wen-Hung
Reeve, Bryce B.
State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
title State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
title_full State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
title_fullStr State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
title_full_unstemmed State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
title_short State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
title_sort state of the psychometric methods: patient-reported outcome measure development and refinement using item response theory
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6663947/
https://www.ncbi.nlm.nih.gov/pubmed/31359210
http://dx.doi.org/10.1186/s41687-019-0130-5
work_keys_str_mv AT stoverangelam stateofthepsychometricmethodspatientreportedoutcomemeasuredevelopmentandrefinementusingitemresponsetheory
AT mcleodlorid stateofthepsychometricmethodspatientreportedoutcomemeasuredevelopmentandrefinementusingitemresponsetheory
AT langermichellem stateofthepsychometricmethodspatientreportedoutcomemeasuredevelopmentandrefinementusingitemresponsetheory
AT chenwenhung stateofthepsychometricmethodspatientreportedoutcomemeasuredevelopmentandrefinementusingitemresponsetheory
AT reevebryceb stateofthepsychometricmethodspatientreportedoutcomemeasuredevelopmentandrefinementusingitemresponsetheory