Cargando…

Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?

Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text...

Descripción completa

Detalles Bibliográficos
Autores principales: Draschl, Alexander, Hauer, Georg, Fischerauer, Stefan Franz, Kogler, Angelika, Leitner, Lukas, Andreou, Dimosthenis, Leithner, Andreas, Sadoghi, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10607052/
https://www.ncbi.nlm.nih.gov/pubmed/37892793
http://dx.doi.org/10.3390/jcm12206655
_version_ 1785127455325945856
author Draschl, Alexander
Hauer, Georg
Fischerauer, Stefan Franz
Kogler, Angelika
Leitner, Lukas
Andreou, Dimosthenis
Leithner, Andreas
Sadoghi, Patrick
author_facet Draschl, Alexander
Hauer, Georg
Fischerauer, Stefan Franz
Kogler, Angelika
Leitner, Lukas
Andreou, Dimosthenis
Leithner, Andreas
Sadoghi, Patrick
author_sort Draschl, Alexander
collection PubMed
description Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making.
format Online
Article
Text
id pubmed-10607052
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106070522023-10-28 Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick J Clin Med Article Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making. MDPI 2023-10-20 /pmc/articles/PMC10607052/ /pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Draschl, Alexander
Hauer, Georg
Fischerauer, Stefan Franz
Kogler, Angelika
Leitner, Lukas
Andreou, Dimosthenis
Leithner, Andreas
Sadoghi, Patrick
Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_full Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_fullStr Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_full_unstemmed Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_short Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_sort are chatgpt’s free-text responses on periprosthetic joint infections of the hip and knee reliable and useful?
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10607052/
https://www.ncbi.nlm.nih.gov/pubmed/37892793
http://dx.doi.org/10.3390/jcm12206655
work_keys_str_mv AT draschlalexander arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT hauergeorg arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT fischerauerstefanfranz arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT koglerangelika arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT leitnerlukas arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT andreoudimosthenis arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT leithnerandreas arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful
AT sadoghipatrick arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful