Cargando…
Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries
In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616302/ https://www.ncbi.nlm.nih.gov/pubmed/37915603 http://dx.doi.org/10.1016/j.isci.2023.108163 |
_version_ | 1785129364996751360 |
---|---|
author | Pushpanathan, Krithi Lim, Zhi Wei Er Yew, Samantha Min Chen, David Ziyou Hui'En Lin, Hazel Anne Lin Goh, Jocelyn Hui Wong, Wendy Meihua Wang, Xiaofei Jin Tan, Marcus Chun Chang Koh, Victor Teck Tham, Yih-Chung |
author_facet | Pushpanathan, Krithi Lim, Zhi Wei Er Yew, Samantha Min Chen, David Ziyou Hui'En Lin, Hazel Anne Lin Goh, Jocelyn Hui Wong, Wendy Meihua Wang, Xiaofei Jin Tan, Marcus Chun Chang Koh, Victor Teck Tham, Yih-Chung |
author_sort | Pushpanathan, Krithi |
collection | PubMed |
description | In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use. |
format | Online Article Text |
id | pubmed-10616302 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-106163022023-11-01 Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries Pushpanathan, Krithi Lim, Zhi Wei Er Yew, Samantha Min Chen, David Ziyou Hui'En Lin, Hazel Anne Lin Goh, Jocelyn Hui Wong, Wendy Meihua Wang, Xiaofei Jin Tan, Marcus Chun Chang Koh, Victor Teck Tham, Yih-Chung iScience Article In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were ‘good’-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use. Elsevier 2023-10-10 /pmc/articles/PMC10616302/ /pubmed/37915603 http://dx.doi.org/10.1016/j.isci.2023.108163 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Pushpanathan, Krithi Lim, Zhi Wei Er Yew, Samantha Min Chen, David Ziyou Hui'En Lin, Hazel Anne Lin Goh, Jocelyn Hui Wong, Wendy Meihua Wang, Xiaofei Jin Tan, Marcus Chun Chang Koh, Victor Teck Tham, Yih-Chung Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
title | Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
title_full | Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
title_fullStr | Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
title_full_unstemmed | Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
title_short | Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
title_sort | popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616302/ https://www.ncbi.nlm.nih.gov/pubmed/37915603 http://dx.doi.org/10.1016/j.isci.2023.108163 |
work_keys_str_mv | AT pushpanathankrithi popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT limzhiwei popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT eryewsamanthamin popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT chendavidziyou popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT huienlinhazelanne popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT lingohjocelynhui popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT wongwendymeihua popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT wangxiaofei popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT jintanmarcuschun popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT changkohvictorteck popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries AT thamyihchung popularlargelanguagemodelchatbotsaccuracycomprehensivenessandselfawarenessinansweringocularsymptomqueries |