Cargando…

Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Objective: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Methods and analysis: Both models’ accuracy was compared across different ophthalmology su...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiao, Cheng, Edupuganti, Neel R, Patel, Parth A, Bui, Tommy, Sheth, Veeral
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cureus 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590143/
https://www.ncbi.nlm.nih.gov/pubmed/37868408
http://dx.doi.org/10.7759/cureus.45700
_version_ 1785123937393311744
author Jiao, Cheng
Edupuganti, Neel R
Patel, Parth A
Bui, Tommy
Sheth, Veeral
author_facet Jiao, Cheng
Edupuganti, Neel R
Patel, Parth A
Bui, Tommy
Sheth, Veeral
author_sort Jiao, Cheng
collection PubMed
description Objective: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Methods and analysis: Both models’ accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) “Diagnosis This” questions. Additional analysis was based on image content, question difficulty, character length of models’ responses, and model’s alignment with responses from human respondents. χ2 test, Fisher’s exact test, Student’s t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant. Results: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0’s answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0’s answer (ρ=0.713, p<0.01). Conclusion and relevance: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education.
format Online
Article
Text
id pubmed-10590143
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cureus
record_format MEDLINE/PubMed
spelling pubmed-105901432023-10-22 Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge Jiao, Cheng Edupuganti, Neel R Patel, Parth A Bui, Tommy Sheth, Veeral Cureus Medical Education Objective: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Methods and analysis: Both models’ accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) “Diagnosis This” questions. Additional analysis was based on image content, question difficulty, character length of models’ responses, and model’s alignment with responses from human respondents. χ2 test, Fisher’s exact test, Student’s t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant. Results: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0’s answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0’s answer (ρ=0.713, p<0.01). Conclusion and relevance: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education. Cureus 2023-09-21 /pmc/articles/PMC10590143/ /pubmed/37868408 http://dx.doi.org/10.7759/cureus.45700 Text en Copyright © 2023, Jiao et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Medical Education
Jiao, Cheng
Edupuganti, Neel R
Patel, Parth A
Bui, Tommy
Sheth, Veeral
Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
title Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
title_full Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
title_fullStr Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
title_full_unstemmed Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
title_short Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
title_sort evaluating the artificial intelligence performance growth in ophthalmic knowledge
topic Medical Education
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590143/
https://www.ncbi.nlm.nih.gov/pubmed/37868408
http://dx.doi.org/10.7759/cureus.45700
work_keys_str_mv AT jiaocheng evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge
AT edupugantineelr evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge
AT patelpartha evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge
AT buitommy evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge
AT shethveeral evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge