Cargando…
Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to q...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cureus
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475149/ https://www.ncbi.nlm.nih.gov/pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924 |
_version_ | 1785100659723337728 |
---|---|
author | Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya |
author_facet | Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya |
author_sort | Kaneda, Yudai |
collection | PubMed |
description | Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients. |
format | Online Article Text |
id | pubmed-10475149 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cureus |
record_format | MEDLINE/PubMed |
spelling | pubmed-104751492023-09-04 Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya Cureus Medical Simulation Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients. Cureus 2023-08-03 /pmc/articles/PMC10475149/ /pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924 Text en Copyright © 2023, Kaneda et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Medical Simulation Kaneda, Yudai Takahashi, Ryo Kaneda, Uiri Akashima, Shiori Okita, Haruna Misaki, Sadaya Yamashiro, Akimi Ozaki, Akihiko Tanimoto, Tetsuya Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination |
title | Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination |
title_full | Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination |
title_fullStr | Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination |
title_full_unstemmed | Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination |
title_short | Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination |
title_sort | assessing the performance of gpt-3.5 and gpt-4 on the 2023 japanese nursing examination |
topic | Medical Simulation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475149/ https://www.ncbi.nlm.nih.gov/pubmed/37667724 http://dx.doi.org/10.7759/cureus.42924 |
work_keys_str_mv | AT kanedayudai assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT takahashiryo assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT kanedauiri assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT akashimashiori assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT okitaharuna assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT misakisadaya assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT yamashiroakimi assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT ozakiakihiko assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination AT tanimototetsuya assessingtheperformanceofgpt35andgpt4onthe2023japanesenursingexamination |