Cargando…
Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations
Purpose and design: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cureus
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10599183/ https://www.ncbi.nlm.nih.gov/pubmed/37885556 http://dx.doi.org/10.7759/cureus.45911 |
_version_ | 1785125719145185280 |
---|---|
author | Oca, Michael C Meller, Leo Wilson, Katherine Parikh, Alomi O McCoy, Allison Chang, Jessica Sudharshan, Rasika Gupta, Shreya Zhang-Nunes, Sandy |
author_facet | Oca, Michael C Meller, Leo Wilson, Katherine Parikh, Alomi O McCoy, Allison Chang, Jessica Sudharshan, Rasika Gupta, Shreya Zhang-Nunes, Sandy |
author_sort | Oca, Michael C |
collection | PubMed |
description | Purpose and design: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities. Methods: Each chatbot returned 80 total recommendations when given the prompt “Find me four good ophthalmologists in (city).” Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson’s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy. Results: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots. Conclusion: This study revealed substantial bias and inaccuracy in the AI chatbots’ recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine. |
format | Online Article Text |
id | pubmed-10599183 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cureus |
record_format | MEDLINE/PubMed |
spelling | pubmed-105991832023-10-26 Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations Oca, Michael C Meller, Leo Wilson, Katherine Parikh, Alomi O McCoy, Allison Chang, Jessica Sudharshan, Rasika Gupta, Shreya Zhang-Nunes, Sandy Cureus Medical Education Purpose and design: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities. Methods: Each chatbot returned 80 total recommendations when given the prompt “Find me four good ophthalmologists in (city).” Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson’s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy. Results: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots. Conclusion: This study revealed substantial bias and inaccuracy in the AI chatbots’ recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine. Cureus 2023-09-25 /pmc/articles/PMC10599183/ /pubmed/37885556 http://dx.doi.org/10.7759/cureus.45911 Text en Copyright © 2023, Oca et al. https://creativecommons.org/licenses/by/3.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Medical Education Oca, Michael C Meller, Leo Wilson, Katherine Parikh, Alomi O McCoy, Allison Chang, Jessica Sudharshan, Rasika Gupta, Shreya Zhang-Nunes, Sandy Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations |
title | Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations |
title_full | Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations |
title_fullStr | Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations |
title_full_unstemmed | Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations |
title_short | Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations |
title_sort | bias and inaccuracy in ai chatbot ophthalmologist recommendations |
topic | Medical Education |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10599183/ https://www.ncbi.nlm.nih.gov/pubmed/37885556 http://dx.doi.org/10.7759/cureus.45911 |
work_keys_str_mv | AT ocamichaelc biasandinaccuracyinaichatbotophthalmologistrecommendations AT mellerleo biasandinaccuracyinaichatbotophthalmologistrecommendations AT wilsonkatherine biasandinaccuracyinaichatbotophthalmologistrecommendations AT parikhalomio biasandinaccuracyinaichatbotophthalmologistrecommendations AT mccoyallison biasandinaccuracyinaichatbotophthalmologistrecommendations AT changjessica biasandinaccuracyinaichatbotophthalmologistrecommendations AT sudharshanrasika biasandinaccuracyinaichatbotophthalmologistrecommendations AT guptashreya biasandinaccuracyinaichatbotophthalmologistrecommendations AT zhangnunessandy biasandinaccuracyinaichatbotophthalmologistrecommendations |