Cargando…
Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?
Background and aims: Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI’s ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10252924/ https://www.ncbi.nlm.nih.gov/pubmed/37296802 http://dx.doi.org/10.3390/diagnostics13111950 |
_version_ | 1785056286255087616 |
---|---|
author | Lahat, Adi Shachar, Eyal Avidan, Benjamin Glicksberg, Benjamin Klang, Eyal |
author_facet | Lahat, Adi Shachar, Eyal Avidan, Benjamin Glicksberg, Benjamin Klang, Eyal |
author_sort | Lahat, Adi |
collection | PubMed |
description | Background and aims: Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI’s ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients’ questions regarding gastrointestinal health. Methods: To evaluate the performance of ChatGPT in answering patients’ questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. Results: ChatGPT was able to provide accurate and clear answers to patients’ questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. Conclusions: While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT. |
format | Online Article Text |
id | pubmed-10252924 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-102529242023-06-10 Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? Lahat, Adi Shachar, Eyal Avidan, Benjamin Glicksberg, Benjamin Klang, Eyal Diagnostics (Basel) Communication Background and aims: Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI’s ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients’ questions regarding gastrointestinal health. Methods: To evaluate the performance of ChatGPT in answering patients’ questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. Results: ChatGPT was able to provide accurate and clear answers to patients’ questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. Conclusions: While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT. MDPI 2023-06-02 /pmc/articles/PMC10252924/ /pubmed/37296802 http://dx.doi.org/10.3390/diagnostics13111950 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Communication Lahat, Adi Shachar, Eyal Avidan, Benjamin Glicksberg, Benjamin Klang, Eyal Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? |
title | Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? |
title_full | Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? |
title_fullStr | Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? |
title_full_unstemmed | Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? |
title_short | Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet? |
title_sort | evaluating the utility of a large language model in answering common patients’ gastrointestinal health-related questions: are we there yet? |
topic | Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10252924/ https://www.ncbi.nlm.nih.gov/pubmed/37296802 http://dx.doi.org/10.3390/diagnostics13111950 |
work_keys_str_mv | AT lahatadi evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet AT shachareyal evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet AT avidanbenjamin evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet AT glicksbergbenjamin evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet AT klangeyal evaluatingtheutilityofalargelanguagemodelinansweringcommonpatientsgastrointestinalhealthrelatedquestionsarewethereyet |