Cargando…

Assessment of ChatGPT’s performance on neurology written board examination questions

BACKGROUND AND OBJECTIVES: ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT’s performance in answering neurology board exam questions. METHODS: Neurology board-style examination questions were accessed from BoardVitals, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Tse Chiang, Multala, Evan, Kearns, Patrick, Delashaw, Johnny, Dumont, Aaron, Maraganore, Demetrius, Wang, Arthur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626870/
https://www.ncbi.nlm.nih.gov/pubmed/37936648
http://dx.doi.org/10.1136/bmjno-2023-000530
_version_ 1785131431391920128
author Chen, Tse Chiang
Multala, Evan
Kearns, Patrick
Delashaw, Johnny
Dumont, Aaron
Maraganore, Demetrius
Wang, Arthur
author_facet Chen, Tse Chiang
Multala, Evan
Kearns, Patrick
Delashaw, Johnny
Dumont, Aaron
Maraganore, Demetrius
Wang, Arthur
author_sort Chen, Tse Chiang
collection PubMed
description BACKGROUND AND OBJECTIVES: ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT’s performance in answering neurology board exam questions. METHODS: Neurology board-style examination questions were accessed from BoardVitals, a commercial neurology question bank. ChatGPT was provided a full question prompt and multiple answer choices. First attempts and additional attempts up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT’s inability to process visual input. The artificial intelligence (AI) answers were then compared with human user data provided by the question bank to gauge its performance. RESULTS: Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first attempt/iteration and 383 (75.3%) over three attempts/iterations, scoring at approximately the 26th and 50th percentiles, respectively. The highest performing subjects were pain (100%), epilepsy & seizures (85%) and genetic (82%) while the lowest performing subjects were imaging/diagnostic studies (27%), critical care (41%) and cranial nerves (48%). DISCUSSION: This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with multiple attempts and performance fell within the expected range of neurology resident learners. This study demonstrates ChatGPT’s potential in processing specialised medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making.
format Online
Article
Text
id pubmed-10626870
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-106268702023-11-07 Assessment of ChatGPT’s performance on neurology written board examination questions Chen, Tse Chiang Multala, Evan Kearns, Patrick Delashaw, Johnny Dumont, Aaron Maraganore, Demetrius Wang, Arthur BMJ Neurol Open Original Research BACKGROUND AND OBJECTIVES: ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT’s performance in answering neurology board exam questions. METHODS: Neurology board-style examination questions were accessed from BoardVitals, a commercial neurology question bank. ChatGPT was provided a full question prompt and multiple answer choices. First attempts and additional attempts up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT’s inability to process visual input. The artificial intelligence (AI) answers were then compared with human user data provided by the question bank to gauge its performance. RESULTS: Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first attempt/iteration and 383 (75.3%) over three attempts/iterations, scoring at approximately the 26th and 50th percentiles, respectively. The highest performing subjects were pain (100%), epilepsy & seizures (85%) and genetic (82%) while the lowest performing subjects were imaging/diagnostic studies (27%), critical care (41%) and cranial nerves (48%). DISCUSSION: This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with multiple attempts and performance fell within the expected range of neurology resident learners. This study demonstrates ChatGPT’s potential in processing specialised medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making. BMJ Publishing Group 2023-11-02 /pmc/articles/PMC10626870/ /pubmed/37936648 http://dx.doi.org/10.1136/bmjno-2023-000530 Text en © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Original Research
Chen, Tse Chiang
Multala, Evan
Kearns, Patrick
Delashaw, Johnny
Dumont, Aaron
Maraganore, Demetrius
Wang, Arthur
Assessment of ChatGPT’s performance on neurology written board examination questions
title Assessment of ChatGPT’s performance on neurology written board examination questions
title_full Assessment of ChatGPT’s performance on neurology written board examination questions
title_fullStr Assessment of ChatGPT’s performance on neurology written board examination questions
title_full_unstemmed Assessment of ChatGPT’s performance on neurology written board examination questions
title_short Assessment of ChatGPT’s performance on neurology written board examination questions
title_sort assessment of chatgpt’s performance on neurology written board examination questions
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626870/
https://www.ncbi.nlm.nih.gov/pubmed/37936648
http://dx.doi.org/10.1136/bmjno-2023-000530
work_keys_str_mv AT chentsechiang assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions
AT multalaevan assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions
AT kearnspatrick assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions
AT delashawjohnny assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions
AT dumontaaron assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions
AT maraganoredemetrius assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions
AT wangarthur assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions