Cargando…

“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”

INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ferro Desideri, Lorenzo, Roth, Janice, Zinkernagel, Martin, Anguita, Rodrigo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657493/ https://www.ncbi.nlm.nih.gov/pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7

_version_	1785148160854720512
author	Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo
author_facet	Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo
author_sort	Ferro Desideri, Lorenzo
collection	PubMed
description	INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS: ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients’ questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS: In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach’s α of 0.237 (range 0.448, 0.096–0.544). CONCLUSION: ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions.
format	Online Article Text
id	pubmed-10657493
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-106574932023-11-18 “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo Int J Retina Vitreous Original Article INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS: ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients’ questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS: In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach’s α of 0.237 (range 0.448, 0.096–0.544). CONCLUSION: ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions. BioMed Central 2023-11-18 /pmc/articles/PMC10657493/ /pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Original Article Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title	“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_full	“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_fullStr	“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_full_unstemmed	“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_short	“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
title_sort	“application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657493/ https://www.ncbi.nlm.nih.gov/pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7
work_keys_str_mv	AT ferrodesiderilorenzo applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration AT rothjanice applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration AT zinkernagelmartin applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration AT anguitarodrigo applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration

“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”

Ejemplares similares