Cargando…

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination

BACKGROUND: Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kung, Justin E., Marshall, Christopher, Gauthier, Chase, Gonzalez, Tyler A., Jackson, J. Benjamin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Journal of Bone and Joint Surgery, Inc. 2023
Materias:	AOA Critical Issues in Education
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484364/ https://www.ncbi.nlm.nih.gov/pubmed/37693092 http://dx.doi.org/10.2106/JBJS.OA.23.00056

_version_	1785102563151970304
author	Kung, Justin E. Marshall, Christopher Gauthier, Chase Gonzalez, Tyler A. Jackson, J. Benjamin
author_facet	Kung, Justin E. Marshall, Christopher Gauthier, Chase Gonzalez, Tyler A. Jackson, J. Benjamin
author_sort	Kung, Justin E.
collection	PubMed
description	BACKGROUND: Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including the Uniform Bar Exam and Scholastic Aptitude Test. The goal of this study was to evaluate ChatGPT performance on the Orthopaedic In-Training Examination (OITE), an assessment of medical knowledge for orthopedic residents. METHODS: OITE 2020, 2021, and 2022 questions without images were inputted into ChatGPT version 3.5 and version 4 (GPT-4) with zero prompting. The performance of ChatGPT was evaluated as a percentage of correct responses and compared with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. ChatGPT was asked to provide a source for its answer, which was categorized as being a journal article, book, or website, and if the source could be verified. Impact factor for the journal cited was also recorded. RESULTS: ChatGPT answered 196 of 360 answers correctly (54.3%), corresponding to a PGY-1 level. ChatGPT cited a verifiable source in 47.2% of questions, with an average median journal impact factor of 5.4. GPT-4 answered 265 of 360 questions correctly (73.6%), corresponding to the average performance of a PGY-5 and exceeding the corresponding passing score for the American Board of Orthopaedic Surgery Part I Examination of 67%. GPT-4 cited a verifiable source in 87.9% of questions, with an average median journal impact factor of 5.2. CONCLUSIONS: ChatGPT performed above the average PGY-1 level and GPT-4 performed better than the average PGY-5 level, showing major improvement. Further investigation is needed to determine how successive versions of ChatGPT would perform and how to optimize this technology to improve medical education. CLINICAL RELEVANCE: AI has the potential to aid in medical education and healthcare delivery.
format	Online Article Text
id	pubmed-10484364
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Journal of Bone and Joint Surgery, Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-104843642023-09-08 Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination Kung, Justin E. Marshall, Christopher Gauthier, Chase Gonzalez, Tyler A. Jackson, J. Benjamin JB JS Open Access AOA Critical Issues in Education BACKGROUND: Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including the Uniform Bar Exam and Scholastic Aptitude Test. The goal of this study was to evaluate ChatGPT performance on the Orthopaedic In-Training Examination (OITE), an assessment of medical knowledge for orthopedic residents. METHODS: OITE 2020, 2021, and 2022 questions without images were inputted into ChatGPT version 3.5 and version 4 (GPT-4) with zero prompting. The performance of ChatGPT was evaluated as a percentage of correct responses and compared with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. ChatGPT was asked to provide a source for its answer, which was categorized as being a journal article, book, or website, and if the source could be verified. Impact factor for the journal cited was also recorded. RESULTS: ChatGPT answered 196 of 360 answers correctly (54.3%), corresponding to a PGY-1 level. ChatGPT cited a verifiable source in 47.2% of questions, with an average median journal impact factor of 5.4. GPT-4 answered 265 of 360 questions correctly (73.6%), corresponding to the average performance of a PGY-5 and exceeding the corresponding passing score for the American Board of Orthopaedic Surgery Part I Examination of 67%. GPT-4 cited a verifiable source in 87.9% of questions, with an average median journal impact factor of 5.2. CONCLUSIONS: ChatGPT performed above the average PGY-1 level and GPT-4 performed better than the average PGY-5 level, showing major improvement. Further investigation is needed to determine how successive versions of ChatGPT would perform and how to optimize this technology to improve medical education. CLINICAL RELEVANCE: AI has the potential to aid in medical education and healthcare delivery. Journal of Bone and Joint Surgery, Inc. 2023-09-08 /pmc/articles/PMC10484364/ /pubmed/37693092 http://dx.doi.org/10.2106/JBJS.OA.23.00056 Text en Copyright © 2023 The Authors. Published by The Journal of Bone and Joint Surgery, Incorporated. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
spellingShingle	AOA Critical Issues in Education Kung, Justin E. Marshall, Christopher Gauthier, Chase Gonzalez, Tyler A. Jackson, J. Benjamin Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
title	Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
title_full	Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
title_fullStr	Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
title_full_unstemmed	Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
title_short	Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination
title_sort	evaluating chatgpt performance on the orthopaedic in-training examination
topic	AOA Critical Issues in Education
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484364/ https://www.ncbi.nlm.nih.gov/pubmed/37693092 http://dx.doi.org/10.2106/JBJS.OA.23.00056
work_keys_str_mv	AT kungjustine evaluatingchatgptperformanceontheorthopaedicintrainingexamination AT marshallchristopher evaluatingchatgptperformanceontheorthopaedicintrainingexamination AT gauthierchase evaluatingchatgptperformanceontheorthopaedicintrainingexamination AT gonzaleztylera evaluatingchatgptperformanceontheorthopaedicintrainingexamination AT jacksonjbenjamin evaluatingchatgptperformanceontheorthopaedicintrainingexamination

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination

Ejemplares similares