Cargando…

Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology

Background Large language models (LLMs) have emerged as powerful tools capable of processing and generating human-like text. These LLMs, such as ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States), Google Bard (Alphabet Inc., CA, US), and Microsoft Bing (Microsoft Corporati...

Descripción completa

Detalles Bibliográficos
Autores principales: Dhanvijay, Anup Kumar D, Pinjar, Mohammed Jaffer, Dhokane, Nitin, Sorte, Smita R, Kumari, Amita, Mondal, Himel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cureus 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475852/
https://www.ncbi.nlm.nih.gov/pubmed/37671207
http://dx.doi.org/10.7759/cureus.42972
_version_ 1785100805868617728
author Dhanvijay, Anup Kumar D
Pinjar, Mohammed Jaffer
Dhokane, Nitin
Sorte, Smita R
Kumari, Amita
Mondal, Himel
author_facet Dhanvijay, Anup Kumar D
Pinjar, Mohammed Jaffer
Dhokane, Nitin
Sorte, Smita R
Kumari, Amita
Mondal, Himel
author_sort Dhanvijay, Anup Kumar D
collection PubMed
description Background Large language models (LLMs) have emerged as powerful tools capable of processing and generating human-like text. These LLMs, such as ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States), Google Bard (Alphabet Inc., CA, US), and Microsoft Bing (Microsoft Corporation, WA, US), have been applied across various domains, demonstrating their potential to assist in solving complex tasks and improving information accessibility. However, their application in solving case vignettes in physiology has not been explored. This study aimed to assess the performance of three LLMs, namely, ChatGPT (3.5; free research version), Google Bard (Experiment), and Microsoft Bing (precise), in answering cases vignettes in Physiology. Methods This cross-sectional study was conducted in July 2023. A total of 77 case vignettes in physiology were prepared by two physiologists and were validated by two other content experts. These cases were presented to each LLM, and their responses were collected. Two physiologists independently rated the answers provided by the LLMs based on their accuracy. The ratings were measured on a scale from 0 to 4 according to the structure of the observed learning outcome (pre-structural = 0, uni-structural = 1, multi-structural = 2, relational = 3, extended-abstract). The scores among the LLMs were compared by Friedman’s test and inter-observer agreement was checked by the intraclass correlation coefficient (ICC). Results The overall scores for ChatGPT, Bing, and Bard in the study, with a total of 77 cases, were found to be 3.19±0.3, 2.15±0.6, and 2.91±0.5, respectively, p<0.0001. Hence, ChatGPT 3.5 (free version) obtained the highest score, Bing (Precise) had the lowest score, and Bard (Experiment) fell in between the two in terms of performance. The average ICC values for ChatGPT, Bing, and Bard were 0.858 (95% CI: 0.777 to 0.91, p<0.0001), 0.975 (95% CI: 0.961 to 0.984, p<0.0001), and 0.964 (95% CI: 0.944 to 0.977, p<0.0001), respectively. Conclusion ChatGPT outperformed Bard and Bing in answering case vignettes in physiology. Hence, students and teachers may think about choosing LLMs for their educational purposes accordingly for case-based learning in physiology. Further exploration of their capabilities is needed for adopting those in medical education and support for clinical decision-making.
format Online
Article
Text
id pubmed-10475852
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cureus
record_format MEDLINE/PubMed
spelling pubmed-104758522023-09-05 Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology Dhanvijay, Anup Kumar D Pinjar, Mohammed Jaffer Dhokane, Nitin Sorte, Smita R Kumari, Amita Mondal, Himel Cureus Medical Education Background Large language models (LLMs) have emerged as powerful tools capable of processing and generating human-like text. These LLMs, such as ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States), Google Bard (Alphabet Inc., CA, US), and Microsoft Bing (Microsoft Corporation, WA, US), have been applied across various domains, demonstrating their potential to assist in solving complex tasks and improving information accessibility. However, their application in solving case vignettes in physiology has not been explored. This study aimed to assess the performance of three LLMs, namely, ChatGPT (3.5; free research version), Google Bard (Experiment), and Microsoft Bing (precise), in answering cases vignettes in Physiology. Methods This cross-sectional study was conducted in July 2023. A total of 77 case vignettes in physiology were prepared by two physiologists and were validated by two other content experts. These cases were presented to each LLM, and their responses were collected. Two physiologists independently rated the answers provided by the LLMs based on their accuracy. The ratings were measured on a scale from 0 to 4 according to the structure of the observed learning outcome (pre-structural = 0, uni-structural = 1, multi-structural = 2, relational = 3, extended-abstract). The scores among the LLMs were compared by Friedman’s test and inter-observer agreement was checked by the intraclass correlation coefficient (ICC). Results The overall scores for ChatGPT, Bing, and Bard in the study, with a total of 77 cases, were found to be 3.19±0.3, 2.15±0.6, and 2.91±0.5, respectively, p<0.0001. Hence, ChatGPT 3.5 (free version) obtained the highest score, Bing (Precise) had the lowest score, and Bard (Experiment) fell in between the two in terms of performance. The average ICC values for ChatGPT, Bing, and Bard were 0.858 (95% CI: 0.777 to 0.91, p<0.0001), 0.975 (95% CI: 0.961 to 0.984, p<0.0001), and 0.964 (95% CI: 0.944 to 0.977, p<0.0001), respectively. Conclusion ChatGPT outperformed Bard and Bing in answering case vignettes in physiology. Hence, students and teachers may think about choosing LLMs for their educational purposes accordingly for case-based learning in physiology. Further exploration of their capabilities is needed for adopting those in medical education and support for clinical decision-making. Cureus 2023-08-04 /pmc/articles/PMC10475852/ /pubmed/37671207 http://dx.doi.org/10.7759/cureus.42972 Text en Copyright © 2023, Dhanvijay et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Medical Education
Dhanvijay, Anup Kumar D
Pinjar, Mohammed Jaffer
Dhokane, Nitin
Sorte, Smita R
Kumari, Amita
Mondal, Himel
Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
title Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
title_full Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
title_fullStr Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
title_full_unstemmed Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
title_short Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology
title_sort performance of large language models (chatgpt, bing search, and google bard) in solving case vignettes in physiology
topic Medical Education
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475852/
https://www.ncbi.nlm.nih.gov/pubmed/37671207
http://dx.doi.org/10.7759/cureus.42972
work_keys_str_mv AT dhanvijayanupkumard performanceoflargelanguagemodelschatgptbingsearchandgooglebardinsolvingcasevignettesinphysiology
AT pinjarmohammedjaffer performanceoflargelanguagemodelschatgptbingsearchandgooglebardinsolvingcasevignettesinphysiology
AT dhokanenitin performanceoflargelanguagemodelschatgptbingsearchandgooglebardinsolvingcasevignettesinphysiology
AT sortesmitar performanceoflargelanguagemodelschatgptbingsearchandgooglebardinsolvingcasevignettesinphysiology
AT kumariamita performanceoflargelanguagemodelschatgptbingsearchandgooglebardinsolvingcasevignettesinphysiology
AT mondalhimel performanceoflargelanguagemodelschatgptbingsearchandgooglebardinsolvingcasevignettesinphysiology