Cargando…

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignet...

Descripción completa

Detalles Bibliográficos
Autores principales: Hirosawa, Takanobu, Harada, Yukinori, Yokose, Masashi, Sakamoto, Tetsu, Kawamura, Ren, Shimizu, Taro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9967747/
https://www.ncbi.nlm.nih.gov/pubmed/36834073
http://dx.doi.org/10.3390/ijerph20043378
_version_ 1784897341999808512
author Hirosawa, Takanobu
Harada, Yukinori
Yokose, Masashi
Sakamoto, Tetsu
Kawamura, Ren
Shimizu, Taro
author_facet Hirosawa, Takanobu
Harada, Yukinori
Yokose, Masashi
Sakamoto, Tetsu
Kawamura, Ren
Shimizu, Taro
author_sort Hirosawa, Takanobu
collection PubMed
description The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.
format Online
Article
Text
id pubmed-9967747
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99677472023-02-27 Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study Hirosawa, Takanobu Harada, Yukinori Yokose, Masashi Sakamoto, Tetsu Kawamura, Ren Shimizu, Taro Int J Environ Res Public Health Article The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future. MDPI 2023-02-15 /pmc/articles/PMC9967747/ /pubmed/36834073 http://dx.doi.org/10.3390/ijerph20043378 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hirosawa, Takanobu
Harada, Yukinori
Yokose, Masashi
Sakamoto, Tetsu
Kawamura, Ren
Shimizu, Taro
Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
title Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
title_full Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
title_fullStr Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
title_full_unstemmed Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
title_short Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
title_sort diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9967747/
https://www.ncbi.nlm.nih.gov/pubmed/36834073
http://dx.doi.org/10.3390/ijerph20043378
work_keys_str_mv AT hirosawatakanobu diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy
AT haradayukinori diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy
AT yokosemasashi diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy
AT sakamototetsu diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy
AT kawamuraren diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy
AT shimizutaro diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy