Cargando…

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

BACKGROUND: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is k...

Descripción completa

Detalles Bibliográficos
Autores principales: Walker, Harriet Louise, Ghani, Shahi, Kuemmerli, Christoph, Nebiker, Christian Andreas, Müller, Beat Peter, Raptis, Dimitri Aristotle, Staubli, Sebastian Manuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365578/
https://www.ncbi.nlm.nih.gov/pubmed/37389908
http://dx.doi.org/10.2196/47479
_version_ 1785077022191517696
author Walker, Harriet Louise
Ghani, Shahi
Kuemmerli, Christoph
Nebiker, Christian Andreas
Müller, Beat Peter
Raptis, Dimitri Aristotle
Staubli, Sebastian Manuel
author_facet Walker, Harriet Louise
Ghani, Shahi
Kuemmerli, Christoph
Nebiker, Christian Andreas
Müller, Beat Peter
Raptis, Dimitri Aristotle
Staubli, Sebastian Manuel
author_sort Walker, Harriet Louise
collection PubMed
description BACKGROUND: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. OBJECTIVE: We aimed to assess the reliability of medical information provided by ChatGPT. METHODS: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. RESULTS: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%. CONCLUSIONS: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.
format Online
Article
Text
id pubmed-10365578
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103655782023-07-25 Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument Walker, Harriet Louise Ghani, Shahi Kuemmerli, Christoph Nebiker, Christian Andreas Müller, Beat Peter Raptis, Dimitri Aristotle Staubli, Sebastian Manuel J Med Internet Res Original Paper BACKGROUND: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. OBJECTIVE: We aimed to assess the reliability of medical information provided by ChatGPT. METHODS: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. RESULTS: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%. CONCLUSIONS: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information. JMIR Publications 2023-06-30 /pmc/articles/PMC10365578/ /pubmed/37389908 http://dx.doi.org/10.2196/47479 Text en ©Harriet Louise Walker, Shahi Ghani, Christoph Kuemmerli, Christian Andreas Nebiker, Beat Peter Müller, Dimitri Aristotle Raptis, Sebastian Manuel Staubli. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Walker, Harriet Louise
Ghani, Shahi
Kuemmerli, Christoph
Nebiker, Christian Andreas
Müller, Beat Peter
Raptis, Dimitri Aristotle
Staubli, Sebastian Manuel
Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument
title Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument
title_full Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument
title_fullStr Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument
title_full_unstemmed Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument
title_short Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument
title_sort reliability of medical information provided by chatgpt: assessment against clinical guidelines and patient information quality instrument
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365578/
https://www.ncbi.nlm.nih.gov/pubmed/37389908
http://dx.doi.org/10.2196/47479
work_keys_str_mv AT walkerharrietlouise reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument
AT ghanishahi reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument
AT kuemmerlichristoph reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument
AT nebikerchristianandreas reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument
AT mullerbeatpeter reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument
AT raptisdimitriaristotle reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument
AT staublisebastianmanuel reliabilityofmedicalinformationprovidedbychatgptassessmentagainstclinicalguidelinesandpatientinformationqualityinstrument