Cargando…

Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions

Importance Chat Generative Pre-Trained Transformer (ChatGPT) has shown promising performance in various fields, including medicine, business, and law, but its accuracy in specialty-specific medical questions, particularly in ophthalmology, is still uncertain. Purpose This study evaluates the perform...

Descripción completa

Detalles Bibliográficos
Autores principales: Moshirfar, Majid, Altaf, Amal W, Stoakes, Isabella M, Tuttle, Jared J, Hoopes, Phillip C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cureus 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10362981/
https://www.ncbi.nlm.nih.gov/pubmed/37485215
http://dx.doi.org/10.7759/cureus.40822
_version_ 1785076543023742976
author Moshirfar, Majid
Altaf, Amal W
Stoakes, Isabella M
Tuttle, Jared J
Hoopes, Phillip C
author_facet Moshirfar, Majid
Altaf, Amal W
Stoakes, Isabella M
Tuttle, Jared J
Hoopes, Phillip C
author_sort Moshirfar, Majid
collection PubMed
description Importance Chat Generative Pre-Trained Transformer (ChatGPT) has shown promising performance in various fields, including medicine, business, and law, but its accuracy in specialty-specific medical questions, particularly in ophthalmology, is still uncertain. Purpose This study evaluates the performance of two ChatGPT models (GPT-3.5 and GPT-4) and human professionals in answering ophthalmology questions from the StatPearls question bank, assessing their outcomes, and providing insights into the integration of artificial intelligence (AI) technology in ophthalmology. Methods ChatGPT's performance was evaluated using 467 ophthalmology questions from the StatPearls question bank. These questions were stratified into 11 subcategories, four difficulty levels, and three generalized anatomical categories. The answer accuracy of GPT-3.5, GPT-4, and human participants was assessed. Statistical analysis was conducted via the Kolmogorov-Smirnov test for normality, one-way analysis of variance (ANOVA) for the statistical significance of GPT-3 versus GPT-4 versus human performance, and repeated unpaired two-sample t-tests to compare the means of two groups. Results GPT-4 outperformed both GPT-3.5 and human professionals on ophthalmology StatPearls questions, except in the "Lens and Cataract" category. The performance differences were statistically significant overall, with GPT-4 achieving higher accuracy (73.2%) compared to GPT-3.5 (55.5%, p-value < 0.001) and humans (58.3%, p-value < 0.001). There were variations in performance across difficulty levels (rated one to four), but GPT-4 consistently performed better than both GPT-3.5 and humans on level-two, -three, and -four questions. On questions of level-four difficulty, human performance significantly exceeded that of GPT-3.5 (p = 0.008). Conclusion The study's findings demonstrate GPT-4's significant performance improvements over GPT-3.5 and human professionals on StatPearls ophthalmology questions. Our results highlight the potential of advanced conversational AI systems to be utilized as important tools in the education and practice of medicine.
format Online
Article
Text
id pubmed-10362981
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cureus
record_format MEDLINE/PubMed
spelling pubmed-103629812023-07-23 Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions Moshirfar, Majid Altaf, Amal W Stoakes, Isabella M Tuttle, Jared J Hoopes, Phillip C Cureus Ophthalmology Importance Chat Generative Pre-Trained Transformer (ChatGPT) has shown promising performance in various fields, including medicine, business, and law, but its accuracy in specialty-specific medical questions, particularly in ophthalmology, is still uncertain. Purpose This study evaluates the performance of two ChatGPT models (GPT-3.5 and GPT-4) and human professionals in answering ophthalmology questions from the StatPearls question bank, assessing their outcomes, and providing insights into the integration of artificial intelligence (AI) technology in ophthalmology. Methods ChatGPT's performance was evaluated using 467 ophthalmology questions from the StatPearls question bank. These questions were stratified into 11 subcategories, four difficulty levels, and three generalized anatomical categories. The answer accuracy of GPT-3.5, GPT-4, and human participants was assessed. Statistical analysis was conducted via the Kolmogorov-Smirnov test for normality, one-way analysis of variance (ANOVA) for the statistical significance of GPT-3 versus GPT-4 versus human performance, and repeated unpaired two-sample t-tests to compare the means of two groups. Results GPT-4 outperformed both GPT-3.5 and human professionals on ophthalmology StatPearls questions, except in the "Lens and Cataract" category. The performance differences were statistically significant overall, with GPT-4 achieving higher accuracy (73.2%) compared to GPT-3.5 (55.5%, p-value < 0.001) and humans (58.3%, p-value < 0.001). There were variations in performance across difficulty levels (rated one to four), but GPT-4 consistently performed better than both GPT-3.5 and humans on level-two, -three, and -four questions. On questions of level-four difficulty, human performance significantly exceeded that of GPT-3.5 (p = 0.008). Conclusion The study's findings demonstrate GPT-4's significant performance improvements over GPT-3.5 and human professionals on StatPearls ophthalmology questions. Our results highlight the potential of advanced conversational AI systems to be utilized as important tools in the education and practice of medicine. Cureus 2023-06-22 /pmc/articles/PMC10362981/ /pubmed/37485215 http://dx.doi.org/10.7759/cureus.40822 Text en Copyright © 2023, Moshirfar et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Ophthalmology
Moshirfar, Majid
Altaf, Amal W
Stoakes, Isabella M
Tuttle, Jared J
Hoopes, Phillip C
Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
title Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
title_full Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
title_fullStr Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
title_full_unstemmed Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
title_short Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
title_sort artificial intelligence in ophthalmology: a comparative analysis of gpt-3.5, gpt-4, and human expertise in answering statpearls questions
topic Ophthalmology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10362981/
https://www.ncbi.nlm.nih.gov/pubmed/37485215
http://dx.doi.org/10.7759/cureus.40822
work_keys_str_mv AT moshirfarmajid artificialintelligenceinophthalmologyacomparativeanalysisofgpt35gpt4andhumanexpertiseinansweringstatpearlsquestions
AT altafamalw artificialintelligenceinophthalmologyacomparativeanalysisofgpt35gpt4andhumanexpertiseinansweringstatpearlsquestions
AT stoakesisabellam artificialintelligenceinophthalmologyacomparativeanalysisofgpt35gpt4andhumanexpertiseinansweringstatpearlsquestions
AT tuttlejaredj artificialintelligenceinophthalmologyacomparativeanalysisofgpt35gpt4andhumanexpertiseinansweringstatpearlsquestions
AT hoopesphillipc artificialintelligenceinophthalmologyacomparativeanalysisofgpt35gpt4andhumanexpertiseinansweringstatpearlsquestions