Cargando…

Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

BACKGROUND: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Roos, Jonas, Kasapovic, Adnan, Jansen, Tom, Kaczmarczyk, Robert
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507517/ https://www.ncbi.nlm.nih.gov/pubmed/37665620 http://dx.doi.org/10.2196/46482

_version_	1785107336067547136
author	Roos, Jonas Kasapovic, Adnan Jansen, Tom Kaczmarczyk, Robert
author_facet	Roos, Jonas Kasapovic, Adnan Jansen, Tom Kaczmarczyk, Robert
author_sort	Roos, Jonas
collection	PubMed
description	BACKGROUND: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparison with medical students. Analyzing the performance of LLMs in state medical examinations can provide insights into their capabilities and limitations and evaluate their potential role in medical education and examination preparation. OBJECTIVE: This study aimed to assess and compare the performance of 3 LLMs, GPT-4, Bing, and GPT-3.5-Turbo, in the German Medical State Examinations of 2022 and to evaluate their performance relative to that of medical students. METHODS: The LLMs were assessed on a total of 630 questions from the spring and fall German Medical State Examinations of 2022. The performance was evaluated with and without media-related questions. Statistical analyses included 1-way ANOVA and independent samples t tests for pairwise comparisons. The relative strength of the LLMs in comparison with that of the students was also evaluated. RESULTS: GPT-4 achieved the highest overall performance, correctly answering 88.1% of questions, closely followed by Bing (86.0%) and GPT-3.5-Turbo (65.7%). The students had an average correct answer rate of 74.6%. Both GPT-4 and Bing significantly outperformed the students in both examinations. When media questions were excluded, Bing achieved the highest performance of 90.7%, closely followed by GPT-4 (90.4%), while GPT-3.5-Turbo lagged (68.2%). There was a significant decline in the performance of GPT-4 and Bing in the fall 2022 examination, which was attributed to a higher proportion of media-related questions and a potential increase in question difficulty. CONCLUSIONS: LLMs, particularly GPT-4 and Bing, demonstrate potential as valuable tools in medical education and for pretesting examination questions. Their high performance, even relative to that of medical students, indicates promising avenues for further development and integration into the educational and clinical landscape.
format	Online Article Text
id	pubmed-10507517
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-105075172023-09-20 Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany Roos, Jonas Kasapovic, Adnan Jansen, Tom Kaczmarczyk, Robert JMIR Med Educ Original Paper BACKGROUND: Large language models (LLMs) have demonstrated significant potential in diverse domains, including medicine. Nonetheless, there is a scarcity of studies examining their performance in medical examinations, especially those conducted in languages other than English, and in direct comparison with medical students. Analyzing the performance of LLMs in state medical examinations can provide insights into their capabilities and limitations and evaluate their potential role in medical education and examination preparation. OBJECTIVE: This study aimed to assess and compare the performance of 3 LLMs, GPT-4, Bing, and GPT-3.5-Turbo, in the German Medical State Examinations of 2022 and to evaluate their performance relative to that of medical students. METHODS: The LLMs were assessed on a total of 630 questions from the spring and fall German Medical State Examinations of 2022. The performance was evaluated with and without media-related questions. Statistical analyses included 1-way ANOVA and independent samples t tests for pairwise comparisons. The relative strength of the LLMs in comparison with that of the students was also evaluated. RESULTS: GPT-4 achieved the highest overall performance, correctly answering 88.1% of questions, closely followed by Bing (86.0%) and GPT-3.5-Turbo (65.7%). The students had an average correct answer rate of 74.6%. Both GPT-4 and Bing significantly outperformed the students in both examinations. When media questions were excluded, Bing achieved the highest performance of 90.7%, closely followed by GPT-4 (90.4%), while GPT-3.5-Turbo lagged (68.2%). There was a significant decline in the performance of GPT-4 and Bing in the fall 2022 examination, which was attributed to a higher proportion of media-related questions and a potential increase in question difficulty. CONCLUSIONS: LLMs, particularly GPT-4 and Bing, demonstrate potential as valuable tools in medical education and for pretesting examination questions. Their high performance, even relative to that of medical students, indicates promising avenues for further development and integration into the educational and clinical landscape. JMIR Publications 2023-09-04 /pmc/articles/PMC10507517/ /pubmed/37665620 http://dx.doi.org/10.2196/46482 Text en ©Jonas Roos, Adnan Kasapovic, Tom Jansen, Robert Kaczmarczyk. Originally published in JMIR Medical Education (https://mededu.jmir.org), 04.09.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Roos, Jonas Kasapovic, Adnan Jansen, Tom Kaczmarczyk, Robert Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
title	Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
title_full	Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
title_fullStr	Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
title_full_unstemmed	Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
title_short	Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany
title_sort	artificial intelligence in medical education: comparative analysis of chatgpt, bing, and medical students in germany
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507517/ https://www.ncbi.nlm.nih.gov/pubmed/37665620 http://dx.doi.org/10.2196/46482
work_keys_str_mv	AT roosjonas artificialintelligenceinmedicaleducationcomparativeanalysisofchatgptbingandmedicalstudentsingermany AT kasapovicadnan artificialintelligenceinmedicaleducationcomparativeanalysisofchatgptbingandmedicalstudentsingermany AT jansentom artificialintelligenceinmedicaleducationcomparativeanalysisofchatgptbingandmedicalstudentsingermany AT kaczmarczykrobert artificialintelligenceinmedicaleducationcomparativeanalysisofchatgptbingandmedicalstudentsingermany

Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany

Ejemplares similares