Cargando…

Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination

BACKGROUND: Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies h...

Descripción completa

Detalles Bibliográficos
Autor principal: Giannos, Panagiotis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10277081/
https://www.ncbi.nlm.nih.gov/pubmed/37337531
http://dx.doi.org/10.1136/bmjno-2023-000451
_version_ 1785060214702080000
author Giannos, Panagiotis
author_facet Giannos, Panagiotis
author_sort Giannos, Panagiotis
collection PubMed
description BACKGROUND: Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience. METHODS: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool—Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared. RESULTS: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics. CONCLUSIONS: The advancements in ChatGPT-4’s performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models’ relevance and reliability in the rapidly evolving field of medicine.
format Online
Article
Text
id pubmed-10277081
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-102770812023-06-19 Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination Giannos, Panagiotis BMJ Neurol Open Short Report BACKGROUND: Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience. METHODS: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool—Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared. RESULTS: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics. CONCLUSIONS: The advancements in ChatGPT-4’s performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models’ relevance and reliability in the rapidly evolving field of medicine. BMJ Publishing Group 2023-06-15 /pmc/articles/PMC10277081/ /pubmed/37337531 http://dx.doi.org/10.1136/bmjno-2023-000451 Text en © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Short Report
Giannos, Panagiotis
Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_full Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_fullStr Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_full_unstemmed Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_short Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_sort evaluating the limits of ai in medical specialisation: chatgpt’s performance on the uk neurology specialty certificate examination
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10277081/
https://www.ncbi.nlm.nih.gov/pubmed/37337531
http://dx.doi.org/10.1136/bmjno-2023-000451
work_keys_str_mv AT giannospanagiotis evaluatingthelimitsofaiinmedicalspecialisationchatgptsperformanceontheukneurologyspecialtycertificateexamination