Cargando…

Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam

OBJECTIVES: This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trai...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahajan, Arushi P., Shabet, Christina L., Smith, Joshua, Rudy, Shannon F., Kupfer, Robbi A., Bohm, Lauren A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10687376/
https://www.ncbi.nlm.nih.gov/pubmed/38034065
http://dx.doi.org/10.1002/oto2.98
Descripción
Sumario:OBJECTIVES: This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trainees and learners. STUDY DESIGN AND SETTING: All available questions from a public, paid‐access question bank were manually input through ChatGPT. METHODS: Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations. RESULTS: Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy. CONCLUSION: Currently, artificial intelligence‐driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub‐specialty specific patient decision making scenarios.