Cargando…

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination

Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to q...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kaneda, Yudai, Takahashi, Ryo, Kaneda, Uiri, Akashima, Shiori, Okita, Haruna, Misaki, Sadaya, Yamashiro, Akimi, Ozaki, Akihiko, Tanimoto, Tetsuya
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cureus 2023
Materias:	Medical Simulation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475149/ https://www.ncbi.nlm.nih.gov/pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924

_version_	1785100659723337728
author	Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya
author_facet	Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya
author_sort	Kaneda, Yudai
collection	PubMed
description	Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients.
format	Online Article Text
id	pubmed-10475149
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cureus
record_format	MEDLINE/PubMed
spelling	pubmed-104751492023-09-04 Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya Cureus Medical Simulation Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients. Cureus 2023-08-03 /pmc/articles/PMC10475149/ /pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924 Text en Copyright © 2023, Kaneda et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Medical Simulation Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title	Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_full	Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_fullStr	Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_full_unstemmed	Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_short	Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
title_sort	assessing the performance of gpt-3.5 and gpt-4 on the 2023 japanese nursing examination
topic	Medical Simulation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475149/ https://www.ncbi.nlm.nih.gov/pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924
work_keys_str_mv	AT kanedayudai assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT takahashiryo assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT kanedauiri assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT akashimashiori assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT okitaharuna assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT misakisadaya assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT yamashiroakimi assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT ozakiakihiko assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT tanimototetsuya assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination

Ejemplares similares