Cargando…

Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology

Background Artificial intelligence (AI) is evolving in the medical education system. ChatGPT, Google Bard, and Microsoft Bing are AI-based models that can solve problems in medical education. However, the applicability of AI to create reasoning-based multiple-choice questions (MCQs) in the field of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Agarwal, Mayank, Sharma, Priyanka, Goswami, Ayan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cureus 2023
Materias:	Medical Education
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10372539/ https://www.ncbi.nlm.nih.gov/pubmed/37519497 http://dx.doi.org/10.7759/cureus.40977

_version_	1785078393047351296
author	Agarwal, Mayank Sharma, Priyanka Goswami, Ayan
author_facet	Agarwal, Mayank Sharma, Priyanka Goswami, Ayan
author_sort	Agarwal, Mayank
collection	PubMed
description	Background Artificial intelligence (AI) is evolving in the medical education system. ChatGPT, Google Bard, and Microsoft Bing are AI-based models that can solve problems in medical education. However, the applicability of AI to create reasoning-based multiple-choice questions (MCQs) in the field of medical physiology is yet to be explored. Objective We aimed to assess and compare the applicability of ChatGPT, Bard, and Bing in generating reasoning-based MCQs for MBBS (Bachelor of Medicine, Bachelor of Surgery) undergraduate students on the subject of physiology. Methods The National Medical Commission of India has developed an 11-module physiology curriculum with various competencies. Two physiologists independently chose a competency from each module. The third physiologist prompted all three AIs to generate five MCQs for each chosen competency. The two physiologists who provided the competencies rated the MCQs generated by the AIs on a scale of 0-3 for validity, difficulty, and reasoning ability required to answer them. We analyzed the average of the two scores using the Kruskal-Wallis test to compare the distribution across the total and module-wise responses, followed by a post-hoc test for pairwise comparisons. We used Cohen's Kappa (Κ) to assess the agreement in scores between the two raters. We expressed the data as a median with an interquartile range. We determined their statistical significance by a p-value <0.05. Results ChatGPT and Bard generated 110 MCQs for the chosen competencies. However, Bing provided only 100 MCQs as it failed to generate them for two competencies. The validity of the MCQs was rated as 3 (3-3) for ChatGPT, 3 (1.5-3) for Bard, and 3 (1.5-3) for Bing, showing a significant difference (p<0.001) among the models. The difficulty of the MCQs was rated as 1 (0-1) for ChatGPT, 1 (1-2) for Bard, and 1 (1-2) for Bing, with a significant difference (p=0.006). The required reasoning ability to answer the MCQs was rated as 1 (1-2) for ChatGPT, 1 (1-2) for Bard, and 1 (1-2) for Bing, with no significant difference (p=0.235). K was ≥ 0.8 for all three parameters across all three AI models. Conclusion AI still needs to evolve to generate reasoning-based MCQs in medical physiology. ChatGPT, Bard, and Bing showed certain limitations. Bing generated significantly least valid MCQs, while ChatGPT generated significantly least difficult MCQs.
format	Online Article Text
id	pubmed-10372539
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cureus
record_format	MEDLINE/PubMed
spelling	pubmed-103725392023-07-28 Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology Agarwal, Mayank Sharma, Priyanka Goswami, Ayan Cureus Medical Education Background Artificial intelligence (AI) is evolving in the medical education system. ChatGPT, Google Bard, and Microsoft Bing are AI-based models that can solve problems in medical education. However, the applicability of AI to create reasoning-based multiple-choice questions (MCQs) in the field of medical physiology is yet to be explored. Objective We aimed to assess and compare the applicability of ChatGPT, Bard, and Bing in generating reasoning-based MCQs for MBBS (Bachelor of Medicine, Bachelor of Surgery) undergraduate students on the subject of physiology. Methods The National Medical Commission of India has developed an 11-module physiology curriculum with various competencies. Two physiologists independently chose a competency from each module. The third physiologist prompted all three AIs to generate five MCQs for each chosen competency. The two physiologists who provided the competencies rated the MCQs generated by the AIs on a scale of 0-3 for validity, difficulty, and reasoning ability required to answer them. We analyzed the average of the two scores using the Kruskal-Wallis test to compare the distribution across the total and module-wise responses, followed by a post-hoc test for pairwise comparisons. We used Cohen's Kappa (Κ) to assess the agreement in scores between the two raters. We expressed the data as a median with an interquartile range. We determined their statistical significance by a p-value <0.05. Results ChatGPT and Bard generated 110 MCQs for the chosen competencies. However, Bing provided only 100 MCQs as it failed to generate them for two competencies. The validity of the MCQs was rated as 3 (3-3) for ChatGPT, 3 (1.5-3) for Bard, and 3 (1.5-3) for Bing, showing a significant difference (p<0.001) among the models. The difficulty of the MCQs was rated as 1 (0-1) for ChatGPT, 1 (1-2) for Bard, and 1 (1-2) for Bing, with a significant difference (p=0.006). The required reasoning ability to answer the MCQs was rated as 1 (1-2) for ChatGPT, 1 (1-2) for Bard, and 1 (1-2) for Bing, with no significant difference (p=0.235). K was ≥ 0.8 for all three parameters across all three AI models. Conclusion AI still needs to evolve to generate reasoning-based MCQs in medical physiology. ChatGPT, Bard, and Bing showed certain limitations. Bing generated significantly least valid MCQs, while ChatGPT generated significantly least difficult MCQs. Cureus 2023-06-26 /pmc/articles/PMC10372539/ /pubmed/37519497 http://dx.doi.org/10.7759/cureus.40977 Text en Copyright © 2023, Agarwal et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Medical Education Agarwal, Mayank Sharma, Priyanka Goswami, Ayan Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
title	Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
title_full	Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
title_fullStr	Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
title_full_unstemmed	Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
title_short	Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
title_sort	analysing the applicability of chatgpt, bard, and bing to generate reasoning-based multiple-choice questions in medical physiology
topic	Medical Education
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10372539/ https://www.ncbi.nlm.nih.gov/pubmed/37519497 http://dx.doi.org/10.7759/cureus.40977
work_keys_str_mv	AT agarwalmayank analysingtheapplicabilityofchatgptbardandbingtogeneratereasoningbasedmultiplechoicequestionsinmedicalphysiology AT sharmapriyanka analysingtheapplicabilityofchatgptbardandbingtogeneratereasoningbasedmultiplechoicequestionsinmedicalphysiology AT goswamiayan analysingtheapplicabilityofchatgptbardandbingtogeneratereasoningbasedmultiplechoicequestionsinmedicalphysiology

Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology

Ejemplares similares