Cargando…

Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?

Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text...

Descripción completa

Detalles Bibliográficos
Autores principales:	Draschl, Alexander, Hauer, Georg, Fischerauer, Stefan Franz, Kogler, Angelika, Leitner, Lukas, Andreou, Dimosthenis, Leithner, Andreas, Sadoghi, Patrick
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10607052/ https://www.ncbi.nlm.nih.gov/pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655

_version_	1785127455325945856
author	Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick
author_facet	Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick
author_sort	Draschl, Alexander
collection	PubMed
description	Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making.
format	Online Article Text
id	pubmed-10607052
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106070522023-10-28 Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick J Clin Med Article Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making. MDPI 2023-10-20 /pmc/articles/PMC10607052/ /pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title	Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_full	Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_fullStr	Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_full_unstemmed	Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_short	Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
title_sort	are chatgpt’s free-text responses on periprosthetic joint infections of the hip and knee reliable and useful?
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10607052/ https://www.ncbi.nlm.nih.gov/pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655
work_keys_str_mv	AT draschlalexander arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT hauergeorg arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT fischerauerstefanfranz arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT koglerangelika arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT leitnerlukas arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT andreoudimosthenis arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT leithnerandreas arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT sadoghipatrick arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful

Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?

Ejemplares similares