Cargando…

Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)

Large Language Models (LLM) are AI tools that can respond human-like to voice or free-text commands without training on specific tasks. However, concerns have been raised about their potential racial bias in healthcare tasks. In this study, ChatGPT was used to generate healthcare-related text for pa...

Descripción completa

Detalles Bibliográficos
Autores principales: Hanna, John J., Wakene, Abdi D., Lehmann, Christoph U., Medford, Richard J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491360/
https://www.ncbi.nlm.nih.gov/pubmed/37693388
http://dx.doi.org/10.1101/2023.08.28.23294730
_version_ 1785104044644106240
author Hanna, John J.
Wakene, Abdi D.
Lehmann, Christoph U.
Medford, Richard J.
author_facet Hanna, John J.
Wakene, Abdi D.
Lehmann, Christoph U.
Medford, Richard J.
author_sort Hanna, John J.
collection PubMed
description Large Language Models (LLM) are AI tools that can respond human-like to voice or free-text commands without training on specific tasks. However, concerns have been raised about their potential racial bias in healthcare tasks. In this study, ChatGPT was used to generate healthcare-related text for patients with HIV, analyzing data from 100 deidentified electronic health record encounters. Each patient’s data were fed four times with all information remaining the same except for race/ethnicity (African American, Asian, Hispanic White, Non-Hispanic White). The text output was analyzed for sentiment, subjectivity, reading ease, and most used words by race/ethnicity and insurance type. Results showed that instructions for African American, Asian, Hispanic White, and Non-Hispanic White patients had an average polarity of 0.14, 0.14, 0.15, and 0.14, respectively, with an average subjectivity of 0.46 for all races/ethnicities. The differences in polarity and subjectivity across races/ethnicities were not statistically significant. However, there was a statistically significant difference in word frequency across races/ethnicities and a statistically significant difference in subjectivity across insurance types with commercial insurance eliciting the most subjective responses and Medicare and other payer types the lowest. The study suggests that ChatGPT is relatively invariant to race/ethnicity and insurance type in terms of linguistic and readability measures. Further studies are needed to validate these results and assess their implications.
format Online
Article
Text
id pubmed-10491360
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-104913602023-09-09 Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1) Hanna, John J. Wakene, Abdi D. Lehmann, Christoph U. Medford, Richard J. medRxiv Article Large Language Models (LLM) are AI tools that can respond human-like to voice or free-text commands without training on specific tasks. However, concerns have been raised about their potential racial bias in healthcare tasks. In this study, ChatGPT was used to generate healthcare-related text for patients with HIV, analyzing data from 100 deidentified electronic health record encounters. Each patient’s data were fed four times with all information remaining the same except for race/ethnicity (African American, Asian, Hispanic White, Non-Hispanic White). The text output was analyzed for sentiment, subjectivity, reading ease, and most used words by race/ethnicity and insurance type. Results showed that instructions for African American, Asian, Hispanic White, and Non-Hispanic White patients had an average polarity of 0.14, 0.14, 0.15, and 0.14, respectively, with an average subjectivity of 0.46 for all races/ethnicities. The differences in polarity and subjectivity across races/ethnicities were not statistically significant. However, there was a statistically significant difference in word frequency across races/ethnicities and a statistically significant difference in subjectivity across insurance types with commercial insurance eliciting the most subjective responses and Medicare and other payer types the lowest. The study suggests that ChatGPT is relatively invariant to race/ethnicity and insurance type in terms of linguistic and readability measures. Further studies are needed to validate these results and assess their implications. Cold Spring Harbor Laboratory 2023-08-28 /pmc/articles/PMC10491360/ /pubmed/37693388 http://dx.doi.org/10.1101/2023.08.28.23294730 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Hanna, John J.
Wakene, Abdi D.
Lehmann, Christoph U.
Medford, Richard J.
Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)
title Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)
title_full Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)
title_fullStr Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)
title_full_unstemmed Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)
title_short Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT(1)
title_sort assessing racial and ethnic bias in text generation for healthcare-related tasks by chatgpt(1)
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491360/
https://www.ncbi.nlm.nih.gov/pubmed/37693388
http://dx.doi.org/10.1101/2023.08.28.23294730
work_keys_str_mv AT hannajohnj assessingracialandethnicbiasintextgenerationforhealthcarerelatedtasksbychatgpt1
AT wakeneabdid assessingracialandethnicbiasintextgenerationforhealthcarerelatedtasksbychatgpt1
AT lehmannchristophu assessingracialandethnicbiasintextgenerationforhealthcarerelatedtasksbychatgpt1
AT medfordrichardj assessingracialandethnicbiasintextgenerationforhealthcarerelatedtasksbychatgpt1