Cargando…
Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge
Objective: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Methods and analysis: Both models’ accuracy was compared across different ophthalmology su...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cureus
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590143/ https://www.ncbi.nlm.nih.gov/pubmed/37868408 http://dx.doi.org/10.7759/cureus.45700 |
_version_ | 1785123937393311744 |
---|---|
author | Jiao, Cheng Edupuganti, Neel R Patel, Parth A Bui, Tommy Sheth, Veeral |
author_facet | Jiao, Cheng Edupuganti, Neel R Patel, Parth A Bui, Tommy Sheth, Veeral |
author_sort | Jiao, Cheng |
collection | PubMed |
description | Objective: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Methods and analysis: Both models’ accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) “Diagnosis This” questions. Additional analysis was based on image content, question difficulty, character length of models’ responses, and model’s alignment with responses from human respondents. χ2 test, Fisher’s exact test, Student’s t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant. Results: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0’s answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0’s answer (ρ=0.713, p<0.01). Conclusion and relevance: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education. |
format | Online Article Text |
id | pubmed-10590143 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cureus |
record_format | MEDLINE/PubMed |
spelling | pubmed-105901432023-10-22 Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge Jiao, Cheng Edupuganti, Neel R Patel, Parth A Bui, Tommy Sheth, Veeral Cureus Medical Education Objective: We aim to compare the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT)-3.5 and ChatGPT-4.0 (OpenAI, San Francisco, CA, USA) in addressing multiple-choice ophthalmic case challenges. Methods and analysis: Both models’ accuracy was compared across different ophthalmology subspecialties using multiple-choice ophthalmic clinical cases provided by the American Academy of Ophthalmology (AAO) “Diagnosis This” questions. Additional analysis was based on image content, question difficulty, character length of models’ responses, and model’s alignment with responses from human respondents. χ2 test, Fisher’s exact test, Student’s t-test, and one-way analysis of variance (ANOVA) were conducted where appropriate, with p<0.05 considered significant. Results: GPT-4.0 significantly outperformed GPT-3.5 (75% versus 46%, p<0.01), with the most noticeable improvement in neuro-ophthalmology (100% versus 38%, p=0.03). While both models struggled with uveitis and refractive questions, GPT-4.0 excelled in other areas, such as pediatric questions (82%). In image-related questions, GPT-4.0 also displayed superior accuracy that trended toward significance (73% versus 46%, p=0.07). GPT-4.0 performed better with easier questions (93.8% (least difficult) versus 76.2% (middle) versus 53.3% (most), p=0.03) and generated more concise answers than GPT-3.5 (651.7±342.9 versus 1,112.9±328.8 characters, p<0.01). Moreover, GPT-4.0’s answers were more in line with those of AAO respondents (57.3% versus 41.4%, p<0.01), showing a strong correlation between its accuracy and the proportion of AAO respondents who selected GPT-4.0’s answer (ρ=0.713, p<0.01). Conclusion and relevance: Our study demonstrated that GPT-4.0 significantly outperforms GPT-3.5 in addressing ophthalmic case challenges, especially in neuro-ophthalmology, with improved accuracy even in image-related questions. These findings underscore the potential of advancing artificial intelligence (AI) models in enhancing ophthalmic diagnostics and medical education. Cureus 2023-09-21 /pmc/articles/PMC10590143/ /pubmed/37868408 http://dx.doi.org/10.7759/cureus.45700 Text en Copyright © 2023, Jiao et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Medical Education Jiao, Cheng Edupuganti, Neel R Patel, Parth A Bui, Tommy Sheth, Veeral Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge |
title | Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge |
title_full | Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge |
title_fullStr | Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge |
title_full_unstemmed | Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge |
title_short | Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge |
title_sort | evaluating the artificial intelligence performance growth in ophthalmic knowledge |
topic | Medical Education |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590143/ https://www.ncbi.nlm.nih.gov/pubmed/37868408 http://dx.doi.org/10.7759/cureus.45700 |
work_keys_str_mv | AT jiaocheng evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge AT edupugantineelr evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge AT patelpartha evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge AT buitommy evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge AT shethveeral evaluatingtheartificialintelligenceperformancegrowthinophthalmicknowledge |