Cargando…
Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10607052/ https://www.ncbi.nlm.nih.gov/pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655 |
_version_ | 1785127455325945856 |
---|---|
author | Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick |
author_facet | Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick |
author_sort | Draschl, Alexander |
collection | PubMed |
description | Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making. |
format | Online Article Text |
id | pubmed-10607052 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-106070522023-10-28 Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick J Clin Med Article Background: This study aimed to evaluate ChatGPT’s performance on questions about periprosthetic joint infections (PJI) of the hip and knee. Methods: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss’ kappa (FK). Results: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on “up-to-dateness” (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. Conclusions: ChatGPT’s free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making. MDPI 2023-10-20 /pmc/articles/PMC10607052/ /pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Draschl, Alexander Hauer, Georg Fischerauer, Stefan Franz Kogler, Angelika Leitner, Lukas Andreou, Dimosthenis Leithner, Andreas Sadoghi, Patrick Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? |
title | Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? |
title_full | Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? |
title_fullStr | Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? |
title_full_unstemmed | Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? |
title_short | Are ChatGPT’s Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? |
title_sort | are chatgpt’s free-text responses on periprosthetic joint infections of the hip and knee reliable and useful? |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10607052/ https://www.ncbi.nlm.nih.gov/pubmed/37892793 http://dx.doi.org/10.3390/jcm12206655 |
work_keys_str_mv | AT draschlalexander arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT hauergeorg arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT fischerauerstefanfranz arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT koglerangelika arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT leitnerlukas arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT andreoudimosthenis arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT leithnerandreas arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful AT sadoghipatrick arechatgptsfreetextresponsesonperiprostheticjointinfectionsofthehipandkneereliableanduseful |