Cargando…

Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models

Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI m...

Descripción completa

Detalles Bibliográficos
Autores principales: Sallam, Malik, Barakat, Muna, Sallam, Mohammed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cureus 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10674084/
https://www.ncbi.nlm.nih.gov/pubmed/38024074
http://dx.doi.org/10.7759/cureus.49373
_version_ 1785149676774752256
author Sallam, Malik
Barakat, Muna
Sallam, Mohammed
author_facet Sallam, Malik
Barakat, Muna
Sallam, Mohammed
author_sort Sallam, Malik
collection PubMed
description Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics.
format Online
Article
Text
id pubmed-10674084
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cureus
record_format MEDLINE/PubMed
spelling pubmed-106740842023-11-24 Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models Sallam, Malik Barakat, Muna Sallam, Mohammed Cureus Quality Improvement Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics. Cureus 2023-11-24 /pmc/articles/PMC10674084/ /pubmed/38024074 http://dx.doi.org/10.7759/cureus.49373 Text en Copyright © 2023, Sallam et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Quality Improvement
Sallam, Malik
Barakat, Muna
Sallam, Mohammed
Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
title Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
title_full Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
title_fullStr Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
title_full_unstemmed Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
title_short Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models
title_sort pilot testing of a tool to standardize the assessment of the quality of health information generated by artificial intelligence-based models
topic Quality Improvement
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10674084/
https://www.ncbi.nlm.nih.gov/pubmed/38024074
http://dx.doi.org/10.7759/cureus.49373
work_keys_str_mv AT sallammalik pilottestingofatooltostandardizetheassessmentofthequalityofhealthinformationgeneratedbyartificialintelligencebasedmodels
AT barakatmuna pilottestingofatooltostandardizetheassessmentofthequalityofhealthinformationgeneratedbyartificialintelligencebasedmodels
AT sallammohammed pilottestingofatooltostandardizetheassessmentofthequalityofhealthinformationgeneratedbyartificialintelligencebasedmodels