Cargando…

Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?

Background and aims: Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI’s ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the...

Descripción completa

Detalles Bibliográficos
Autores principales: Lahat, Adi, Shachar, Eyal, Avidan, Benjamin, Glicksberg, Benjamin, Klang, Eyal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10252924/
https://www.ncbi.nlm.nih.gov/pubmed/37296802
http://dx.doi.org/10.3390/diagnostics13111950
_version_ 1785056286255087616
author Lahat, Adi
Shachar, Eyal
Avidan, Benjamin
Glicksberg, Benjamin
Klang, Eyal
author_facet Lahat, Adi
Shachar, Eyal
Avidan, Benjamin
Glicksberg, Benjamin
Klang, Eyal
author_sort Lahat, Adi
collection PubMed
description Background and aims: Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI’s ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients’ questions regarding gastrointestinal health. Methods: To evaluate the performance of ChatGPT in answering patients’ questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. Results: ChatGPT was able to provide accurate and clear answers to patients’ questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. Conclusions: While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.
format Online
Article
Text
id pubmed-10252924
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102529242023-06-10 Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? Lahat, Adi Shachar, Eyal Avidan, Benjamin Glicksberg, Benjamin Klang, Eyal Diagnostics (Basel) Communication Background and aims: Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI’s ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients’ questions regarding gastrointestinal health. Methods: To evaluate the performance of ChatGPT in answering patients’ questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. Results: ChatGPT was able to provide accurate and clear answers to patients’ questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. Conclusions: While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT. MDPI 2023-06-02 /pmc/articles/PMC10252924/ /pubmed/37296802 http://dx.doi.org/10.3390/diagnostics13111950 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Communication
Lahat, Adi
Shachar, Eyal
Avidan, Benjamin
Glicksberg, Benjamin
Klang, Eyal
Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
title Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
title_full Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
title_fullStr Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
title_full_unstemmed Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
title_short Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
title_sort evaluating the utility of a large language model in answering common patients’ gastrointestinal health-related questions: are we there yet?
topic Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10252924/
https://www.ncbi.nlm.nih.gov/pubmed/37296802
http://dx.doi.org/10.3390/diagnostics13111950
work_keys_str_mv AT lahatadi evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet
AT shachareyal evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet
AT avidanbenjamin evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet
AT glicksbergbenjamin evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet
AT klangeyal evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet