Cargando…

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination

Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to q...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaneda, Yudai, Takahashi, Ryo, Kaneda, Uiri, Akashima, Shiori, Okita, Haruna, Misaki, Sadaya, Yamashiro, Akimi, Ozaki, Akihiko, Tanimoto, Tetsuya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cureus 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475149/
https://www.ncbi.nlm.nih.gov/pubmed/37667724
http://dx.doi.org/10.7759/cureus.42924
_version_ 1785100659723337728
author Kaneda, Yudai
Takahashi, Ryo
Kaneda, Uiri
Akashima, Shiori
Okita, Haruna
Misaki, Sadaya
Yamashiro, Akimi
Ozaki, Akihiko
Tanimoto, Tetsuya
author_facet Kaneda, Yudai
Takahashi, Ryo
Kaneda, Uiri
Akashima, Shiori
Okita, Haruna
Misaki, Sadaya
Yamashiro, Akimi
Ozaki, Akihiko
Tanimoto, Tetsuya
author_sort Kaneda, Yudai
collection PubMed
description Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients.
format Online
Article
Text
id pubmed-10475149
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cureus
record_format MEDLINE/PubMed
spelling pubmed-104751492023-09-04 Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya Cureus Medical Simulation Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients. Cureus 2023-08-03 /pmc/articles/PMC10475149/ /pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924 Text en Copyright © 2023, Kaneda et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Medical Simulation
Kaneda, Yudai
Takahashi, Ryo
Kaneda, Uiri
Akashima, Shiori
Okita, Haruna
Misaki, Sadaya
Yamashiro, Akimi
Ozaki, Akihiko
Tanimoto, Tetsuya
Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_full Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_fullStr Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_full_unstemmed Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_short Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_sort assessing the performance of gpt-3.5 and gpt-4 on the 2023 japanese nursing examination
topic Medical Simulation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475149/
https://www.ncbi.nlm.nih.gov/pubmed/37667724
http://dx.doi.org/10.7759/cureus.42924
work_keys_str_mv AT kanedayudai assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT takahashiryo assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT kanedauiri assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT akashimashiori assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT okitaharuna assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT misakisadaya assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT yamashiroakimi assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT ozakiakihiko assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination
AT tanimototetsuya assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination