Cargando…

How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective

OBJECTIVE: To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. METHODS: We re-analyzed the publicly available data of two prominent case vignette-based symptom che...

Descripción completa

Detalles Bibliográficos
Autores principales: Kopka, Marvin, Feufel, Markus A, Berner, Eta S, Schmieding, Malte L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444026/
https://www.ncbi.nlm.nih.gov/pubmed/37614591
http://dx.doi.org/10.1177/20552076231194929
_version_ 1785093961462841344
author Kopka, Marvin
Feufel, Markus A
Berner, Eta S
Schmieding, Malte L
author_facet Kopka, Marvin
Feufel, Markus A
Berner, Eta S
Schmieding, Malte L
author_sort Kopka, Marvin
collection PubMed
description OBJECTIVE: To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. METHODS: We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. RESULTS: In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. CONCLUSIONS: A test–theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.
format Online
Article
Text
id pubmed-10444026
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-104440262023-08-23 How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective Kopka, Marvin Feufel, Markus A Berner, Eta S Schmieding, Malte L Digit Health Original Research OBJECTIVE: To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. METHODS: We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. RESULTS: In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. CONCLUSIONS: A test–theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results. SAGE Publications 2023-08-21 /pmc/articles/PMC10444026/ /pubmed/37614591 http://dx.doi.org/10.1177/20552076231194929 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Kopka, Marvin
Feufel, Markus A
Berner, Eta S
Schmieding, Malte L
How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective
title How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective
title_full How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective
title_fullStr How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective
title_full_unstemmed How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective
title_short How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective
title_sort how suitable are clinical vignettes for the evaluation of symptom checker apps? a test theoretical perspective
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444026/
https://www.ncbi.nlm.nih.gov/pubmed/37614591
http://dx.doi.org/10.1177/20552076231194929
work_keys_str_mv AT kopkamarvin howsuitableareclinicalvignettesfortheevaluationofsymptomcheckerappsatesttheoreticalperspective
AT feufelmarkusa howsuitableareclinicalvignettesfortheevaluationofsymptomcheckerappsatesttheoreticalperspective
AT berneretas howsuitableareclinicalvignettesfortheevaluationofsymptomcheckerappsatesttheoreticalperspective
AT schmiedingmaltel howsuitableareclinicalvignettesfortheevaluationofsymptomcheckerappsatesttheoreticalperspective