Cargando…

Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries

In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly...

Descripción completa

Detalles Bibliográficos
Autores principales: Pushpanathan, Krithi, Lim, Zhi Wei, Er Yew, Samantha Min, Chen, David Ziyou, Hui'En Lin, Hazel Anne, Lin Goh, Jocelyn Hui, Wong, Wendy Meihua, Wang, Xiaofei, Jin Tan, Marcus Chun, Chang Koh, Victor Teck, Tham, Yih-Chung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616302/
https://www.ncbi.nlm.nih.gov/pubmed/37915603
http://dx.doi.org/10.1016/j.isci.2023.108163
_version_ 1785129364996751360
author Pushpanathan, Krithi
Lim, Zhi Wei
Er Yew, Samantha Min
Chen, David Ziyou
Hui'En Lin, Hazel Anne
Lin Goh, Jocelyn Hui
Wong, Wendy Meihua
Wang, Xiaofei
Jin Tan, Marcus Chun
Chang Koh, Victor Teck
Tham, Yih-Chung
author_facet Pushpanathan, Krithi
Lim, Zhi Wei
Er Yew, Samantha Min
Chen, David Ziyou
Hui'En Lin, Hazel Anne
Lin Goh, Jocelyn Hui
Wong, Wendy Meihua
Wang, Xiaofei
Jin Tan, Marcus Chun
Chang Koh, Victor Teck
Tham, Yih-Chung
author_sort Pushpanathan, Krithi
collection PubMed
description In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
format Online
Article
Text
id pubmed-10616302
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106163022023-11-01 Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries Pushpanathan, Krithi Lim, Zhi Wei Er Yew, Samantha Min Chen, David Ziyou Hui'En Lin, Hazel Anne Lin Goh, Jocelyn Hui Wong, Wendy Meihua Wang, Xiaofei Jin Tan, Marcus Chun Chang Koh, Victor Teck Tham, Yih-Chung iScience Article In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use. Elsevier 2023-10-10 /pmc/articles/PMC10616302/ /pubmed/37915603 http://dx.doi.org/10.1016/j.isci.2023.108163 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Pushpanathan, Krithi
Lim, Zhi Wei
Er Yew, Samantha Min
Chen, David Ziyou
Hui'En Lin, Hazel Anne
Lin Goh, Jocelyn Hui
Wong, Wendy Meihua
Wang, Xiaofei
Jin Tan, Marcus Chun
Chang Koh, Victor Teck
Tham, Yih-Chung
Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
title Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
title_full Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
title_fullStr Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
title_full_unstemmed Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
title_short Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
title_sort popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616302/
https://www.ncbi.nlm.nih.gov/pubmed/37915603
http://dx.doi.org/10.1016/j.isci.2023.108163
work_keys_str_mv AT pushpanathankrithi popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT limzhiwei popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT eryewsamanthamin popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT chendavidziyou popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT huienlinhazelanne popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT lingohjocelynhui popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT wongwendymeihua popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT wangxiaofei popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT jintanmarcuschun popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT changkohvictorteck popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries
AT thamyihchung popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries