Cargando…

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study

BACKGROUND: ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations. OBJECTIVE: We aimed to evaluate the performance of Cha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Taira, Kazuya, Itaya, Takahiro, Hanada, Ayame
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337249/ https://www.ncbi.nlm.nih.gov/pubmed/37368470 http://dx.doi.org/10.2196/47305

_version_	1785071379816644608
author	Taira, Kazuya Itaya, Takahiro Hanada, Ayame
author_facet	Taira, Kazuya Itaya, Takahiro Hanada, Ayame
author_sort	Taira, Kazuya
collection	PubMed
description	BACKGROUND: ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations. OBJECTIVE: We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations. METHODS: We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include “questions with inappropriate question difficulty” and “questions with errors in the questions or choices.” These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient’s and family situation’s description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023. RESULTS: The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice. CONCLUSIONS: ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing.
format	Online Article Text
id	pubmed-10337249
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-103372492023-07-13 Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study Taira, Kazuya Itaya, Takahiro Hanada, Ayame JMIR Nurs Original Paper BACKGROUND: ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations. OBJECTIVE: We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations. METHODS: We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include “questions with inappropriate question difficulty” and “questions with errors in the questions or choices.” These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient’s and family situation’s description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023. RESULTS: The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice. CONCLUSIONS: ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing. JMIR Publications 2023-06-27 /pmc/articles/PMC10337249/ /pubmed/37368470 http://dx.doi.org/10.2196/47305 Text en ©Kazuya Taira, Takahiro Itaya, Ayame Hanada. Originally published in JMIR Nursing (https://nursing.jmir.org), 27.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Nursing, is properly cited. The complete bibliographic information, a link to the original publication on https://nursing.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Taira, Kazuya Itaya, Takahiro Hanada, Ayame Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study
title	Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study
title_full	Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study
title_fullStr	Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study
title_full_unstemmed	Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study
title_short	Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study
title_sort	performance of the large language model chatgpt on the national nurse examinations in japan: evaluation study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337249/ https://www.ncbi.nlm.nih.gov/pubmed/37368470 http://dx.doi.org/10.2196/47305
work_keys_str_mv	AT tairakazuya performanceofthelargelanguagemodelchatgptonthenationalnurseexaminationsinjapanevaluationstudy AT itayatakahiro performanceofthelargelanguagemodelchatgptonthenationalnurseexaminationsinjapanevaluationstudy AT hanadaayame performanceofthelargelanguagemodelchatgptonthenationalnurseexaminationsinjapanevaluationstudy

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study

Ejemplares similares