Cargando…
Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study
The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignet...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9967747/ https://www.ncbi.nlm.nih.gov/pubmed/36834073 http://dx.doi.org/10.3390/ijerph20043378 |
_version_ | 1784897341999808512 |
---|---|
author | Hirosawa, Takanobu Harada, Yukinori Yokose, Masashi Sakamoto, Tetsu Kawamura, Ren Shimizu, Taro |
author_facet | Hirosawa, Takanobu Harada, Yukinori Yokose, Masashi Sakamoto, Tetsu Kawamura, Ren Shimizu, Taro |
author_sort | Hirosawa, Takanobu |
collection | PubMed |
description | The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future. |
format | Online Article Text |
id | pubmed-9967747 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99677472023-02-27 Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study Hirosawa, Takanobu Harada, Yukinori Yokose, Masashi Sakamoto, Tetsu Kawamura, Ren Shimizu, Taro Int J Environ Res Public Health Article The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future. MDPI 2023-02-15 /pmc/articles/PMC9967747/ /pubmed/36834073 http://dx.doi.org/10.3390/ijerph20043378 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Hirosawa, Takanobu Harada, Yukinori Yokose, Masashi Sakamoto, Tetsu Kawamura, Ren Shimizu, Taro Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study |
title | Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study |
title_full | Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study |
title_fullStr | Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study |
title_full_unstemmed | Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study |
title_short | Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study |
title_sort | diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9967747/ https://www.ncbi.nlm.nih.gov/pubmed/36834073 http://dx.doi.org/10.3390/ijerph20043378 |
work_keys_str_mv | AT hirosawatakanobu diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy AT haradayukinori diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy AT yokosemasashi diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy AT sakamototetsu diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy AT kawamuraren diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy AT shimizutaro diagnosticaccuracyofdifferentialdiagnosislistsgeneratedbygenerativepretrainedtransformer3chatbotforclinicalvignetteswithcommonchiefcomplaintsapilotstudy |