Cargando…

Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health

BACKGROUND: With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed a...

Descripción completa

Detalles Bibliográficos
Autores principales: Heinz, Michael V., Bhattacharya, Sukanya, Trudeau, Brianna, Quist, Rachel, Song, Seo Ho, Lee, Camilla M., Jacobson, Nicholas C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10123874/
https://www.ncbi.nlm.nih.gov/pubmed/37101589
http://dx.doi.org/10.1177/20552076231170499
_version_ 1785029741373292544
author Heinz, Michael V.
Bhattacharya, Sukanya
Trudeau, Brianna
Quist, Rachel
Song, Seo Ho
Lee, Camilla M.
Jacobson, Nicholas C.
author_facet Heinz, Michael V.
Bhattacharya, Sukanya
Trudeau, Brianna
Quist, Rachel
Song, Seo Ho
Lee, Camilla M.
Jacobson, Nicholas C.
author_sort Heinz, Michael V.
collection PubMed
description BACKGROUND: With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed at understanding domain knowledge and potential biases of such systems are necessary for ongoing translational development and future deployment in high-stakes healthcare settings. METHODS: We investigated the domain knowledge and demographic bias of a generative, AI model using contrived clinical vignettes with systematically varied demographic features. We used balanced accuracy (BAC) to quantify the model’s performance. We used generalized linear mixed-effects models to quantify the relationship between demographic factors and model interpretation. FINDINGS: We found variable model performance across diagnoses; attention deficit hyperactivity disorder, posttraumatic stress disorder, alcohol use disorder, narcissistic personality disorder, binge eating disorder, and generalized anxiety disorder showed high BAC (0.70 ≤ BAC ≤ 0.82); bipolar disorder, bulimia nervosa, barbiturate use disorder, conduct disorder, somatic symptom disorder, benzodiazepine use disorder, LSD use disorder, histrionic personality disorder, and functional neurological symptom disorder showed low BAC (BAC ≤ 0.59). INTERPRETATION: Our findings demonstrate initial promise in the domain knowledge of a large AI model, with performance variability perhaps due to the more salient hallmark symptoms, narrower differential diagnosis, and higher prevalence of some disorders. We found limited evidence of model demographic bias, although we do observe some gender and racial differences in model outcomes mirroring real-world differential prevalence estimates.
format Online
Article
Text
id pubmed-10123874
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-101238742023-04-25 Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health Heinz, Michael V. Bhattacharya, Sukanya Trudeau, Brianna Quist, Rachel Song, Seo Ho Lee, Camilla M. Jacobson, Nicholas C. Digit Health Original Research BACKGROUND: With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed at understanding domain knowledge and potential biases of such systems are necessary for ongoing translational development and future deployment in high-stakes healthcare settings. METHODS: We investigated the domain knowledge and demographic bias of a generative, AI model using contrived clinical vignettes with systematically varied demographic features. We used balanced accuracy (BAC) to quantify the model’s performance. We used generalized linear mixed-effects models to quantify the relationship between demographic factors and model interpretation. FINDINGS: We found variable model performance across diagnoses; attention deficit hyperactivity disorder, posttraumatic stress disorder, alcohol use disorder, narcissistic personality disorder, binge eating disorder, and generalized anxiety disorder showed high BAC (0.70 ≤ BAC ≤ 0.82); bipolar disorder, bulimia nervosa, barbiturate use disorder, conduct disorder, somatic symptom disorder, benzodiazepine use disorder, LSD use disorder, histrionic personality disorder, and functional neurological symptom disorder showed low BAC (BAC ≤ 0.59). INTERPRETATION: Our findings demonstrate initial promise in the domain knowledge of a large AI model, with performance variability perhaps due to the more salient hallmark symptoms, narrower differential diagnosis, and higher prevalence of some disorders. We found limited evidence of model demographic bias, although we do observe some gender and racial differences in model outcomes mirroring real-world differential prevalence estimates. SAGE Publications 2023-04-17 /pmc/articles/PMC10123874/ /pubmed/37101589 http://dx.doi.org/10.1177/20552076231170499 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Heinz, Michael V.
Bhattacharya, Sukanya
Trudeau, Brianna
Quist, Rachel
Song, Seo Ho
Lee, Camilla M.
Jacobson, Nicholas C.
Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
title Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
title_full Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
title_fullStr Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
title_full_unstemmed Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
title_short Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
title_sort testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10123874/
https://www.ncbi.nlm.nih.gov/pubmed/37101589
http://dx.doi.org/10.1177/20552076231170499
work_keys_str_mv AT heinzmichaelv testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth
AT bhattacharyasukanya testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth
AT trudeaubrianna testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth
AT quistrachel testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth
AT songseoho testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth
AT leecamillam testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth
AT jacobsonnicholasc testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth