Cargando…

Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations

Purpose and design: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot...

Descripción completa

Detalles Bibliográficos
Autores principales: Oca, Michael C, Meller, Leo, Wilson, Katherine, Parikh, Alomi O, McCoy, Allison, Chang, Jessica, Sudharshan, Rasika, Gupta, Shreya, Zhang-Nunes, Sandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cureus 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10599183/
https://www.ncbi.nlm.nih.gov/pubmed/37885556
http://dx.doi.org/10.7759/cureus.45911
_version_ 1785125719145185280
author Oca, Michael C
Meller, Leo
Wilson, Katherine
Parikh, Alomi O
McCoy, Allison
Chang, Jessica
Sudharshan, Rasika
Gupta, Shreya
Zhang-Nunes, Sandy
author_facet Oca, Michael C
Meller, Leo
Wilson, Katherine
Parikh, Alomi O
McCoy, Allison
Chang, Jessica
Sudharshan, Rasika
Gupta, Shreya
Zhang-Nunes, Sandy
author_sort Oca, Michael C
collection PubMed
description Purpose and design: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities. Methods: Each chatbot returned 80 total recommendations when given the prompt “Find me four good ophthalmologists in (city).” Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson’s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy. Results: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots. Conclusion: This study revealed substantial bias and inaccuracy in the AI chatbots’ recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.
format Online
Article
Text
id pubmed-10599183
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cureus
record_format MEDLINE/PubMed
spelling pubmed-105991832023-10-26 Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations Oca, Michael C Meller, Leo Wilson, Katherine Parikh, Alomi O McCoy, Allison Chang, Jessica Sudharshan, Rasika Gupta, Shreya Zhang-Nunes, Sandy Cureus Medical Education Purpose and design: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities. Methods: Each chatbot returned 80 total recommendations when given the prompt “Find me four good ophthalmologists in (city).” Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson’s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy. Results: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots. Conclusion: This study revealed substantial bias and inaccuracy in the AI chatbots’ recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine. Cureus 2023-09-25 /pmc/articles/PMC10599183/ /pubmed/37885556 http://dx.doi.org/10.7759/cureus.45911 Text en Copyright © 2023, Oca et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Medical Education
Oca, Michael C
Meller, Leo
Wilson, Katherine
Parikh, Alomi O
McCoy, Allison
Chang, Jessica
Sudharshan, Rasika
Gupta, Shreya
Zhang-Nunes, Sandy
Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
title Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
title_full Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
title_fullStr Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
title_full_unstemmed Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
title_short Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
title_sort bias and inaccuracy in ai chatbot ophthalmologist recommendations
topic Medical Education
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10599183/
https://www.ncbi.nlm.nih.gov/pubmed/37885556
http://dx.doi.org/10.7759/cureus.45911
work_keys_str_mv AT ocamichaelc biasandinaccuracyinaichatbotophthalmologistrecommendations
AT mellerleo biasandinaccuracyinaichatbotophthalmologistrecommendations
AT wilsonkatherine biasandinaccuracyinaichatbotophthalmologistrecommendations
AT parikhalomio biasandinaccuracyinaichatbotophthalmologistrecommendations
AT mccoyallison biasandinaccuracyinaichatbotophthalmologistrecommendations
AT changjessica biasandinaccuracyinaichatbotophthalmologistrecommendations
AT sudharshanrasika biasandinaccuracyinaichatbotophthalmologistrecommendations
AT guptashreya biasandinaccuracyinaichatbotophthalmologistrecommendations
AT zhangnunessandy biasandinaccuracyinaichatbotophthalmologistrecommendations