Cargando…

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

BACKGROUND: The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. OBJECTIVE: This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the...

Descripción completa

Detalles Bibliográficos
Autores principales: Takagi, Soshi, Watari, Takashi, Erabi, Ayano, Sakaguchi, Kota
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365615/
https://www.ncbi.nlm.nih.gov/pubmed/37384388
http://dx.doi.org/10.2196/48002
_version_ 1785077030957613056
author Takagi, Soshi
Watari, Takashi
Erabi, Ayano
Sakaguchi, Kota
author_facet Takagi, Soshi
Watari, Takashi
Erabi, Ayano
Sakaguchi, Kota
author_sort Takagi, Soshi
collection PubMed
description BACKGROUND: The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. OBJECTIVE: This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. METHODS: This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. RESULTS: The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. CONCLUSIONS: GPT-4 could become a valuable tool for medical education and clinical support in non–English-speaking regions, such as Japan.
format Online
Article
Text
id pubmed-10365615
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103656152023-07-25 Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study Takagi, Soshi Watari, Takashi Erabi, Ayano Sakaguchi, Kota JMIR Med Educ Original Paper BACKGROUND: The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. OBJECTIVE: This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. METHODS: This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. RESULTS: The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. CONCLUSIONS: GPT-4 could become a valuable tool for medical education and clinical support in non–English-speaking regions, such as Japan. JMIR Publications 2023-06-29 /pmc/articles/PMC10365615/ /pubmed/37384388 http://dx.doi.org/10.2196/48002 Text en ©Soshi Takagi, Takashi Watari, Ayano Erabi, Kota Sakaguchi. Originally published in JMIR Medical Education (https://mededu.jmir.org), 29.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Takagi, Soshi
Watari, Takashi
Erabi, Ayano
Sakaguchi, Kota
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
title Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
title_full Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
title_fullStr Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
title_full_unstemmed Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
title_short Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
title_sort performance of gpt-3.5 and gpt-4 on the japanese medical licensing examination: comparison study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365615/
https://www.ncbi.nlm.nih.gov/pubmed/37384388
http://dx.doi.org/10.2196/48002
work_keys_str_mv AT takagisoshi performanceofgpt35andgpt4onthejapanesemedicallicensingexaminationcomparisonstudy
AT wataritakashi performanceofgpt35andgpt4onthejapanesemedicallicensingexaminationcomparisonstudy
AT erabiayano performanceofgpt35andgpt4onthejapanesemedicallicensingexaminationcomparisonstudy
AT sakaguchikota performanceofgpt35andgpt4onthejapanesemedicallicensingexaminationcomparisonstudy