Cargando…

“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”

INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferro Desideri, Lorenzo, Roth, Janice, Zinkernagel, Martin, Anguita, Rodrigo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657493/
https://www.ncbi.nlm.nih.gov/pubmed/37980501
http://dx.doi.org/10.1186/s40942-023-00511-7
_version_ 1785148160854720512
author Ferro Desideri, Lorenzo
Roth, Janice
Zinkernagel, Martin
Anguita, Rodrigo
author_facet Ferro Desideri, Lorenzo
Roth, Janice
Zinkernagel, Martin
Anguita, Rodrigo
author_sort Ferro Desideri, Lorenzo
collection PubMed
description INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS: ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients’ questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS: In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach’s α of 0.237 (range 0.448, 0.096–0.544). CONCLUSION: ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions.
format Online
Article
Text
id pubmed-10657493
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106574932023-11-18 “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo Int J Retina Vitreous Original Article INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS: ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients’ questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS: In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach’s α of 0.237 (range 0.448, 0.096–0.544). CONCLUSION: ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions. BioMed Central 2023-11-18 /pmc/articles/PMC10657493/ /pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Original Article
Ferro Desideri, Lorenzo
Roth, Janice
Zinkernagel, Martin
Anguita, Rodrigo
“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_full “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_fullStr “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_full_unstemmed “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_short “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_sort “application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657493/
https://www.ncbi.nlm.nih.gov/pubmed/37980501
http://dx.doi.org/10.1186/s40942-023-00511-7
work_keys_str_mv AT ferrodesiderilorenzo applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration
AT rothjanice applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration
AT zinkernagelmartin applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration
AT anguitarodrigo applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration