Cargando…

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

PURPOSE: This study aimed to assess the performance of ChatGPT, specifically the GPT-3.5 and GPT-4 models, in understanding complex surgical clinical information and its potential implications for surgical education and training. METHODS: The dataset comprised 280 questions from the Korean general s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Oh, Namkee, Choi, Gyu-Seong, Lee, Woo Yong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	The Korean Surgical Society 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10172028/ https://www.ncbi.nlm.nih.gov/pubmed/37179699 http://dx.doi.org/10.4174/astr.2023.104.5.269

_version_	1785039540363198464
author	Oh, Namkee Choi, Gyu-Seong Lee, Woo Yong
author_facet	Oh, Namkee Choi, Gyu-Seong Lee, Woo Yong
author_sort	Oh, Namkee
collection	PubMed
description	PURPOSE: This study aimed to assess the performance of ChatGPT, specifically the GPT-3.5 and GPT-4 models, in understanding complex surgical clinical information and its potential implications for surgical education and training. METHODS: The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test. RESULTS: GPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4%, indicating a notable difference in performance between the models (P < 0.001). GPT-4 also exhibited consistent performance across all subspecialties, with accuracy rates ranging from 63.6% to 83.3%. CONCLUSION: ChatGPT, particularly GPT-4, demonstrates a remarkable ability to understand complex surgical clinical information, achieving an accuracy rate of 76.4% on the Korean general surgery board exam. However, it is important to recognize the limitations of large language models and ensure that they are used in conjunction with human expertise and judgment.
format	Online Article Text
id	pubmed-10172028
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	The Korean Surgical Society
record_format	MEDLINE/PubMed
spelling	pubmed-101720282023-05-12 ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models Oh, Namkee Choi, Gyu-Seong Lee, Woo Yong Ann Surg Treat Res Original Article PURPOSE: This study aimed to assess the performance of ChatGPT, specifically the GPT-3.5 and GPT-4 models, in understanding complex surgical clinical information and its potential implications for surgical education and training. METHODS: The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test. RESULTS: GPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4%, indicating a notable difference in performance between the models (P < 0.001). GPT-4 also exhibited consistent performance across all subspecialties, with accuracy rates ranging from 63.6% to 83.3%. CONCLUSION: ChatGPT, particularly GPT-4, demonstrates a remarkable ability to understand complex surgical clinical information, achieving an accuracy rate of 76.4% on the Korean general surgery board exam. However, it is important to recognize the limitations of large language models and ensure that they are used in conjunction with human expertise and judgment. The Korean Surgical Society 2023-05 2023-04-28 /pmc/articles/PMC10172028/ /pubmed/37179699 http://dx.doi.org/10.4174/astr.2023.104.5.269 Text en Copyright © 2023, the Korean Surgical Society https://creativecommons.org/licenses/by-nc/4.0/Annals of Surgical Treatment and Research is an Open Access Journal. All articles are distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Oh, Namkee Choi, Gyu-Seong Lee, Woo Yong ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
title	ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
title_full	ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
title_fullStr	ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
title_full_unstemmed	ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
title_short	ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
title_sort	chatgpt goes to the operating room: evaluating gpt-4 performance and its potential in surgical education and training in the era of large language models
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10172028/ https://www.ncbi.nlm.nih.gov/pubmed/37179699 http://dx.doi.org/10.4174/astr.2023.104.5.269
work_keys_str_mv	AT ohnamkee chatgptgoestotheoperatingroomevaluatinggpt4performanceanditspotentialinsurgicaleducationandtrainingintheeraoflargelanguagemodels AT choigyuseong chatgptgoestotheoperatingroomevaluatinggpt4performanceanditspotentialinsurgicaleducationandtrainingintheeraoflargelanguagemodels AT leewooyong chatgptgoestotheoperatingroomevaluatinggpt4performanceanditspotentialinsurgicaleducationandtrainingintheeraoflargelanguagemodels

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

Ejemplares similares