Cargando…
Assessment of ChatGPT’s performance on neurology written board examination questions
BACKGROUND AND OBJECTIVES: ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT’s performance in answering neurology board exam questions. METHODS: Neurology board-style examination questions were accessed from BoardVitals, a...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Publishing Group
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626870/ https://www.ncbi.nlm.nih.gov/pubmed/37936648 http://dx.doi.org/10.1136/bmjno-2023-000530 |
_version_ | 1785131431391920128 |
---|---|
author | Chen, Tse Chiang Multala, Evan Kearns, Patrick Delashaw, Johnny Dumont, Aaron Maraganore, Demetrius Wang, Arthur |
author_facet | Chen, Tse Chiang Multala, Evan Kearns, Patrick Delashaw, Johnny Dumont, Aaron Maraganore, Demetrius Wang, Arthur |
author_sort | Chen, Tse Chiang |
collection | PubMed |
description | BACKGROUND AND OBJECTIVES: ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT’s performance in answering neurology board exam questions. METHODS: Neurology board-style examination questions were accessed from BoardVitals, a commercial neurology question bank. ChatGPT was provided a full question prompt and multiple answer choices. First attempts and additional attempts up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT’s inability to process visual input. The artificial intelligence (AI) answers were then compared with human user data provided by the question bank to gauge its performance. RESULTS: Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first attempt/iteration and 383 (75.3%) over three attempts/iterations, scoring at approximately the 26th and 50th percentiles, respectively. The highest performing subjects were pain (100%), epilepsy & seizures (85%) and genetic (82%) while the lowest performing subjects were imaging/diagnostic studies (27%), critical care (41%) and cranial nerves (48%). DISCUSSION: This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with multiple attempts and performance fell within the expected range of neurology resident learners. This study demonstrates ChatGPT’s potential in processing specialised medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making. |
format | Online Article Text |
id | pubmed-10626870 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BMJ Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-106268702023-11-07 Assessment of ChatGPT’s performance on neurology written board examination questions Chen, Tse Chiang Multala, Evan Kearns, Patrick Delashaw, Johnny Dumont, Aaron Maraganore, Demetrius Wang, Arthur BMJ Neurol Open Original Research BACKGROUND AND OBJECTIVES: ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT’s performance in answering neurology board exam questions. METHODS: Neurology board-style examination questions were accessed from BoardVitals, a commercial neurology question bank. ChatGPT was provided a full question prompt and multiple answer choices. First attempts and additional attempts up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT’s inability to process visual input. The artificial intelligence (AI) answers were then compared with human user data provided by the question bank to gauge its performance. RESULTS: Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first attempt/iteration and 383 (75.3%) over three attempts/iterations, scoring at approximately the 26th and 50th percentiles, respectively. The highest performing subjects were pain (100%), epilepsy & seizures (85%) and genetic (82%) while the lowest performing subjects were imaging/diagnostic studies (27%), critical care (41%) and cranial nerves (48%). DISCUSSION: This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with multiple attempts and performance fell within the expected range of neurology resident learners. This study demonstrates ChatGPT’s potential in processing specialised medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making. BMJ Publishing Group 2023-11-02 /pmc/articles/PMC10626870/ /pubmed/37936648 http://dx.doi.org/10.1136/bmjno-2023-000530 Text en © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) . |
spellingShingle | Original Research Chen, Tse Chiang Multala, Evan Kearns, Patrick Delashaw, Johnny Dumont, Aaron Maraganore, Demetrius Wang, Arthur Assessment of ChatGPT’s performance on neurology written board examination questions |
title | Assessment of ChatGPT’s performance on neurology written board examination questions |
title_full | Assessment of ChatGPT’s performance on neurology written board examination questions |
title_fullStr | Assessment of ChatGPT’s performance on neurology written board examination questions |
title_full_unstemmed | Assessment of ChatGPT’s performance on neurology written board examination questions |
title_short | Assessment of ChatGPT’s performance on neurology written board examination questions |
title_sort | assessment of chatgpt’s performance on neurology written board examination questions |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626870/ https://www.ncbi.nlm.nih.gov/pubmed/37936648 http://dx.doi.org/10.1136/bmjno-2023-000530 |
work_keys_str_mv | AT chentsechiang assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions AT multalaevan assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions AT kearnspatrick assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions AT delashawjohnny assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions AT dumontaaron assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions AT maraganoredemetrius assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions AT wangarthur assessmentofchatgptsperformanceonneurologywrittenboardexaminationquestions |