Cargando…

APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support

IMPORTANCE: Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Kwong, Jethro C. C., Khondker, Adree, Lajkosz, Katherine, McDermott, Matthew B. A., Frigola, Xavier Borrat, McCradden, Melissa D., Mamdani, Muhammad, Kulkarni, Girish S., Johnson, Alistair E. W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10520738/
https://www.ncbi.nlm.nih.gov/pubmed/37747733
http://dx.doi.org/10.1001/jamanetworkopen.2023.35377
_version_ 1785109987764207616
author Kwong, Jethro C. C.
Khondker, Adree
Lajkosz, Katherine
McDermott, Matthew B. A.
Frigola, Xavier Borrat
McCradden, Melissa D.
Mamdani, Muhammad
Kulkarni, Girish S.
Johnson, Alistair E. W.
author_facet Kwong, Jethro C. C.
Khondker, Adree
Lajkosz, Katherine
McDermott, Matthew B. A.
Frigola, Xavier Borrat
McCradden, Melissa D.
Mamdani, Muhammad
Kulkarni, Girish S.
Johnson, Alistair E. W.
author_sort Kwong, Jethro C. C.
collection PubMed
description IMPORTANCE: Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question. OBJECTIVE: To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support. DESIGN, SETTING, AND PARTICIPANTS: This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022. MAIN OUTCOMES AND MEASURES: The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. RESULTS: A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P < .001), 3-year citation rate (Spearman ρ, 0.69; 95% CI, 0.43-0.85; P < .001), number of QUADAS-2 low risk-of-bias domains (Spearman ρ, 0.56; 95% CI, 0.24-0.77; P = .002), and adherence to the TRIPOD statement (Spearman ρ, 0.87; 95% CI, 0.73-0.94; P < .001). Intraclass correlation coefficient ranges for interrater and intrarater reliability were 0.74 to 1.00 for individual items, 0.81 to 0.99 for individual domains, and 0.91 to 0.98 for overall scores. CONCLUSIONS AND RELEVANCE: In this quality improvement study, APPRAISE-AI demonstrated strong interrater and intrarater reliability and correlated well with several study quality measures. This tool may provide a quantitative approach for investigators, reviewers, editors, and funding organizations to compare the research quality across AI studies for clinical decision support.
format Online
Article
Text
id pubmed-10520738
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-105207382023-09-27 APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support Kwong, Jethro C. C. Khondker, Adree Lajkosz, Katherine McDermott, Matthew B. A. Frigola, Xavier Borrat McCradden, Melissa D. Mamdani, Muhammad Kulkarni, Girish S. Johnson, Alistair E. W. JAMA Netw Open Original Investigation IMPORTANCE: Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question. OBJECTIVE: To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support. DESIGN, SETTING, AND PARTICIPANTS: This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022. MAIN OUTCOMES AND MEASURES: The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. RESULTS: A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P < .001), 3-year citation rate (Spearman ρ, 0.69; 95% CI, 0.43-0.85; P < .001), number of QUADAS-2 low risk-of-bias domains (Spearman ρ, 0.56; 95% CI, 0.24-0.77; P = .002), and adherence to the TRIPOD statement (Spearman ρ, 0.87; 95% CI, 0.73-0.94; P < .001). Intraclass correlation coefficient ranges for interrater and intrarater reliability were 0.74 to 1.00 for individual items, 0.81 to 0.99 for individual domains, and 0.91 to 0.98 for overall scores. CONCLUSIONS AND RELEVANCE: In this quality improvement study, APPRAISE-AI demonstrated strong interrater and intrarater reliability and correlated well with several study quality measures. This tool may provide a quantitative approach for investigators, reviewers, editors, and funding organizations to compare the research quality across AI studies for clinical decision support. American Medical Association 2023-09-25 /pmc/articles/PMC10520738/ /pubmed/37747733 http://dx.doi.org/10.1001/jamanetworkopen.2023.35377 Text en Copyright 2023 Kwong JCC et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Kwong, Jethro C. C.
Khondker, Adree
Lajkosz, Katherine
McDermott, Matthew B. A.
Frigola, Xavier Borrat
McCradden, Melissa D.
Mamdani, Muhammad
Kulkarni, Girish S.
Johnson, Alistair E. W.
APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support
title APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support
title_full APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support
title_fullStr APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support
title_full_unstemmed APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support
title_short APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support
title_sort appraise-ai tool for quantitative evaluation of ai studies for clinical decision support
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10520738/
https://www.ncbi.nlm.nih.gov/pubmed/37747733
http://dx.doi.org/10.1001/jamanetworkopen.2023.35377
work_keys_str_mv AT kwongjethrocc appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT khondkeradree appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT lajkoszkatherine appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT mcdermottmatthewba appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT frigolaxavierborrat appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT mccraddenmelissad appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT mamdanimuhammad appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT kulkarnigirishs appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport
AT johnsonalistairew appraiseaitoolforquantitativeevaluationofaistudiesforclinicaldecisionsupport