Cargando…
Assessing the Efficacy of ChatGPT in Solving Questions Based on the Core Concepts in Physiology
Background and objective ChatGPT is a large language model (LLM) generative artificial intelligence (AI) chatbot trained through deep learning to produce human-like language skills and analysis of simple problems across a wide variety of subject areas. However, in terms of facilitating the transfer...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cureus
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10492920/ https://www.ncbi.nlm.nih.gov/pubmed/37700949 http://dx.doi.org/10.7759/cureus.43314 |
Sumario: | Background and objective ChatGPT is a large language model (LLM) generative artificial intelligence (AI) chatbot trained through deep learning to produce human-like language skills and analysis of simple problems across a wide variety of subject areas. However, in terms of facilitating the transfer of learning in medical education, a concern has arisen that while AI is adept at applying surface-level understanding, it does not have the necessary in-depth knowledge to act at an expert level, particularly in addressing the core concepts. In this study, we explored the efficacy of ChatGPT in solving various reasoning questions based on the five core concepts applied to different modules in the subject of physiology. Materials and methods In this study, a total of 82 reasoning-type questions from six modules applicable to the five core concepts were created by the subject experts. The questions were used to chat with the conversational AI tool and the responses generated at first instance were considered for scoring and analysis. To compare the scores among various modules and five core concepts separately, the Kruskal-Wallis test along with post hoc analysis were used. Results The overall mean score for the modules (60 questions) was 3.72 ±0.26 while the average score obtained for the core concepts (60 questions) was 3.68 ±0.30. Furthermore, statistically significant differences (p=0.05 for modules and p=0.024 for core concepts) were observed among various modules as well as core concepts. Conclusion The significant differences observed in the scores among various modules and core concepts highlight the varying execution of the same software tool, thereby necessitating the need for further evaluation of AI-enabled learning applications to enhance the transfer of learning among undergraduates. |
---|