Cargando…

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools

BACKGROUND: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Ashwal, Fahmi Y, Zawiah, Mohammed, Gharaibeh, Lobna, Abu-Farha, Rana, Bitar, Ahmad Naoras
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Dove 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518176/
https://www.ncbi.nlm.nih.gov/pubmed/37750052
http://dx.doi.org/10.2147/DHPS.S425858
_version_ 1785109457458429952
author Al-Ashwal, Fahmi Y
Zawiah, Mohammed
Gharaibeh, Lobna
Abu-Farha, Rana
Bitar, Ahmad Naoras
author_facet Al-Ashwal, Fahmi Y
Zawiah, Mohammed
Gharaibeh, Lobna
Abu-Farha, Rana
Bitar, Ahmad Naoras
author_sort Al-Ashwal, Fahmi Y
collection PubMed
description BACKGROUND: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions. METHODS: AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool. RESULTS: When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class. CONCLUSION: Bing AI had the highest accuracy and specificity, outperforming Google’s Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.
format Online
Article
Text
id pubmed-10518176
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Dove
record_format MEDLINE/PubMed
spelling pubmed-105181762023-09-25 Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools Al-Ashwal, Fahmi Y Zawiah, Mohammed Gharaibeh, Lobna Abu-Farha, Rana Bitar, Ahmad Naoras Drug Healthc Patient Saf Original Research BACKGROUND: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions. METHODS: AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool. RESULTS: When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class. CONCLUSION: Bing AI had the highest accuracy and specificity, outperforming Google’s Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety. Dove 2023-09-20 /pmc/articles/PMC10518176/ /pubmed/37750052 http://dx.doi.org/10.2147/DHPS.S425858 Text en © 2023 Al-Ashwal et al. https://creativecommons.org/licenses/by-nc/3.0/This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/ (https://creativecommons.org/licenses/by-nc/3.0/) ). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php).
spellingShingle Original Research
Al-Ashwal, Fahmi Y
Zawiah, Mohammed
Gharaibeh, Lobna
Abu-Farha, Rana
Bitar, Ahmad Naoras
Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools
title Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools
title_full Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools
title_fullStr Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools
title_full_unstemmed Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools
title_short Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools
title_sort evaluating the sensitivity, specificity, and accuracy of chatgpt-3.5, chatgpt-4, bing ai, and bard against conventional drug-drug interactions clinical tools
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518176/
https://www.ncbi.nlm.nih.gov/pubmed/37750052
http://dx.doi.org/10.2147/DHPS.S425858
work_keys_str_mv AT alashwalfahmiy evaluatingthesensitivityspecificityandaccuracyofchatgpt35chatgpt4bingaiandbardagainstconventionaldrugdruginteractionsclinicaltools
AT zawiahmohammed evaluatingthesensitivityspecificityandaccuracyofchatgpt35chatgpt4bingaiandbardagainstconventionaldrugdruginteractionsclinicaltools
AT gharaibehlobna evaluatingthesensitivityspecificityandaccuracyofchatgpt35chatgpt4bingaiandbardagainstconventionaldrugdruginteractionsclinicaltools
AT abufarharana evaluatingthesensitivityspecificityandaccuracyofchatgpt35chatgpt4bingaiandbardagainstconventionaldrugdruginteractionsclinicaltools
AT bitarahmadnaoras evaluatingthesensitivityspecificityandaccuracyofchatgpt35chatgpt4bingaiandbardagainstconventionaldrugdruginteractionsclinicaltools