Cargando…

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions

IMPORTANCE: Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bernstein, Isaac A., Zhang, Youchen (Victor), Govil, Devendra, Majid, Iyad, Chang, Robert T., Sun, Yang, Shue, Ann, Chou, Jonathan C., Schehlein, Emily, Christopher, Karen L., Groth, Sylvia L., Ludwig, Cassie, Wang, Sophia Y.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Medical Association 2023
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10445188/ https://www.ncbi.nlm.nih.gov/pubmed/37606922 http://dx.doi.org/10.1001/jamanetworkopen.2023.30320

_version_	1785094123801280512
author	Bernstein, Isaac A. Zhang, Youchen (Victor) Govil, Devendra Majid, Iyad Chang, Robert T. Sun, Yang Shue, Ann Chou, Jonathan C. Schehlein, Emily Christopher, Karen L. Groth, Sylvia L. Ludwig, Cassie Wang, Sophia Y.
author_facet	Bernstein, Isaac A. Zhang, Youchen (Victor) Govil, Devendra Majid, Iyad Chang, Robert T. Sun, Yang Shue, Ann Chou, Jonathan C. Schehlein, Emily Christopher, Karen L. Groth, Sylvia L. Ludwig, Cassie Wang, Sophia Y.
author_sort	Bernstein, Isaac A.
collection	PubMed
description	IMPORTANCE: Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients. OBJECTIVE: To evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)–affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023. MAIN OUTCOMES AND MEASURES: Identification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm. RESULTS: A total of 200 pairs of user questions and answers by AAO-affiliated ophthalmologists were evaluated. The mean (SD) accuracy for distinguishing between AI and human responses was 61.3% (9.7%). Of 800 evaluations of chatbot-written answers, 168 answers (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written. Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI (prevalence ratio [PR], 1.72; 95% CI, 1.52-1.93). The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers (PR, 0.92; 95% CI, 0.77-1.10), and did not differ from human answers in terms of likelihood of harm (PR, 0.84; 95% CI, 0.67-1.07) nor extent of harm (PR, 0.99; 95% CI, 0.80-1.22). CONCLUSIONS AND RELEVANCE: In this cross-sectional study of human-written and AI-generated responses to 200 eye care questions from an online advice forum, a chatbot appeared capable of responding to long user-written eye health posts and largely generated appropriate responses that did not differ significantly from ophthalmologist-written responses in terms of incorrect information, likelihood of harm, extent of harm, or deviation from ophthalmologist community standards. Additional research is needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation, to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimizes harm.
format	Online Article Text
id	pubmed-10445188
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Medical Association
record_format	MEDLINE/PubMed
spelling	pubmed-104451882023-08-24 Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions Bernstein, Isaac A. Zhang, Youchen (Victor) Govil, Devendra Majid, Iyad Chang, Robert T. Sun, Yang Shue, Ann Chou, Jonathan C. Schehlein, Emily Christopher, Karen L. Groth, Sylvia L. Ludwig, Cassie Wang, Sophia Y. JAMA Netw Open Original Investigation IMPORTANCE: Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients. OBJECTIVE: To evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)–affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023. MAIN OUTCOMES AND MEASURES: Identification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm. RESULTS: A total of 200 pairs of user questions and answers by AAO-affiliated ophthalmologists were evaluated. The mean (SD) accuracy for distinguishing between AI and human responses was 61.3% (9.7%). Of 800 evaluations of chatbot-written answers, 168 answers (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written. Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI (prevalence ratio [PR], 1.72; 95% CI, 1.52-1.93). The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers (PR, 0.92; 95% CI, 0.77-1.10), and did not differ from human answers in terms of likelihood of harm (PR, 0.84; 95% CI, 0.67-1.07) nor extent of harm (PR, 0.99; 95% CI, 0.80-1.22). CONCLUSIONS AND RELEVANCE: In this cross-sectional study of human-written and AI-generated responses to 200 eye care questions from an online advice forum, a chatbot appeared capable of responding to long user-written eye health posts and largely generated appropriate responses that did not differ significantly from ophthalmologist-written responses in terms of incorrect information, likelihood of harm, extent of harm, or deviation from ophthalmologist community standards. Additional research is needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation, to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimizes harm. American Medical Association 2023-08-22 /pmc/articles/PMC10445188/ /pubmed/37606922 http://dx.doi.org/10.1001/jamanetworkopen.2023.30320 Text en Copyright 2023 Bernstein IA et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License.
spellingShingle	Original Investigation Bernstein, Isaac A. Zhang, Youchen (Victor) Govil, Devendra Majid, Iyad Chang, Robert T. Sun, Yang Shue, Ann Chou, Jonathan C. Schehlein, Emily Christopher, Karen L. Groth, Sylvia L. Ludwig, Cassie Wang, Sophia Y. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
title	Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
title_full	Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
title_fullStr	Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
title_full_unstemmed	Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
title_short	Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
title_sort	comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10445188/ https://www.ncbi.nlm.nih.gov/pubmed/37606922 http://dx.doi.org/10.1001/jamanetworkopen.2023.30320
work_keys_str_mv	AT bernsteinisaaca comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT zhangyouchenvictor comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT govildevendra comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT majidiyad comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT changrobertt comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT sunyang comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT shueann comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT choujonathanc comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT schehleinemily comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT christopherkarenl comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT grothsylvial comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT ludwigcassie comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions AT wangsophiay comparisonofophthalmologistandlargelanguagemodelchatbotresponsestoonlinepatienteyecarequestions

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions

Ejemplares similares