Cargando…
“Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration”
INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657493/ https://www.ncbi.nlm.nih.gov/pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7 |
_version_ | 1785148160854720512 |
---|---|
author | Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo |
author_facet | Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo |
author_sort | Ferro Desideri, Lorenzo |
collection | PubMed |
description | INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS: ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients’ questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS: In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach’s α of 0.237 (range 0.448, 0.096–0.544). CONCLUSION: ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions. |
format | Online Article Text |
id | pubmed-10657493 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106574932023-11-18 “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo Int J Retina Vitreous Original Article INTRODUCTION: Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS: ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients’ questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS: In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach’s α of 0.237 (range 0.448, 0.096–0.544). CONCLUSION: ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions. BioMed Central 2023-11-18 /pmc/articles/PMC10657493/ /pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Original Article Ferro Desideri, Lorenzo Roth, Janice Zinkernagel, Martin Anguita, Rodrigo “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
title | “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
title_full | “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
title_fullStr | “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
title_full_unstemmed | “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
title_short | “Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
title_sort | “application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration” |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657493/ https://www.ncbi.nlm.nih.gov/pubmed/37980501 http://dx.doi.org/10.1186/s40942-023-00511-7 |
work_keys_str_mv | AT ferrodesiderilorenzo applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration AT rothjanice applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration AT zinkernagelmartin applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration AT anguitarodrigo applicationandaccuracyofartificialintelligencederivedlargelanguagemodelsinpatientswithagerelatedmaculardegeneration |