Cargando…
Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health
BACKGROUND: With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed a...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10123874/ https://www.ncbi.nlm.nih.gov/pubmed/37101589 http://dx.doi.org/10.1177/20552076231170499 |
_version_ | 1785029741373292544 |
---|---|
author | Heinz, Michael V. Bhattacharya, Sukanya Trudeau, Brianna Quist, Rachel Song, Seo Ho Lee, Camilla M. Jacobson, Nicholas C. |
author_facet | Heinz, Michael V. Bhattacharya, Sukanya Trudeau, Brianna Quist, Rachel Song, Seo Ho Lee, Camilla M. Jacobson, Nicholas C. |
author_sort | Heinz, Michael V. |
collection | PubMed |
description | BACKGROUND: With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed at understanding domain knowledge and potential biases of such systems are necessary for ongoing translational development and future deployment in high-stakes healthcare settings. METHODS: We investigated the domain knowledge and demographic bias of a generative, AI model using contrived clinical vignettes with systematically varied demographic features. We used balanced accuracy (BAC) to quantify the model’s performance. We used generalized linear mixed-effects models to quantify the relationship between demographic factors and model interpretation. FINDINGS: We found variable model performance across diagnoses; attention deficit hyperactivity disorder, posttraumatic stress disorder, alcohol use disorder, narcissistic personality disorder, binge eating disorder, and generalized anxiety disorder showed high BAC (0.70 ≤ BAC ≤ 0.82); bipolar disorder, bulimia nervosa, barbiturate use disorder, conduct disorder, somatic symptom disorder, benzodiazepine use disorder, LSD use disorder, histrionic personality disorder, and functional neurological symptom disorder showed low BAC (BAC ≤ 0.59). INTERPRETATION: Our findings demonstrate initial promise in the domain knowledge of a large AI model, with performance variability perhaps due to the more salient hallmark symptoms, narrower differential diagnosis, and higher prevalence of some disorders. We found limited evidence of model demographic bias, although we do observe some gender and racial differences in model outcomes mirroring real-world differential prevalence estimates. |
format | Online Article Text |
id | pubmed-10123874 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-101238742023-04-25 Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health Heinz, Michael V. Bhattacharya, Sukanya Trudeau, Brianna Quist, Rachel Song, Seo Ho Lee, Camilla M. Jacobson, Nicholas C. Digit Health Original Research BACKGROUND: With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed at understanding domain knowledge and potential biases of such systems are necessary for ongoing translational development and future deployment in high-stakes healthcare settings. METHODS: We investigated the domain knowledge and demographic bias of a generative, AI model using contrived clinical vignettes with systematically varied demographic features. We used balanced accuracy (BAC) to quantify the model’s performance. We used generalized linear mixed-effects models to quantify the relationship between demographic factors and model interpretation. FINDINGS: We found variable model performance across diagnoses; attention deficit hyperactivity disorder, posttraumatic stress disorder, alcohol use disorder, narcissistic personality disorder, binge eating disorder, and generalized anxiety disorder showed high BAC (0.70 ≤ BAC ≤ 0.82); bipolar disorder, bulimia nervosa, barbiturate use disorder, conduct disorder, somatic symptom disorder, benzodiazepine use disorder, LSD use disorder, histrionic personality disorder, and functional neurological symptom disorder showed low BAC (BAC ≤ 0.59). INTERPRETATION: Our findings demonstrate initial promise in the domain knowledge of a large AI model, with performance variability perhaps due to the more salient hallmark symptoms, narrower differential diagnosis, and higher prevalence of some disorders. We found limited evidence of model demographic bias, although we do observe some gender and racial differences in model outcomes mirroring real-world differential prevalence estimates. SAGE Publications 2023-04-17 /pmc/articles/PMC10123874/ /pubmed/37101589 http://dx.doi.org/10.1177/20552076231170499 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Heinz, Michael V. Bhattacharya, Sukanya Trudeau, Brianna Quist, Rachel Song, Seo Ho Lee, Camilla M. Jacobson, Nicholas C. Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
title | Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
title_full | Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
title_fullStr | Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
title_full_unstemmed | Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
title_short | Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
title_sort | testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10123874/ https://www.ncbi.nlm.nih.gov/pubmed/37101589 http://dx.doi.org/10.1177/20552076231170499 |
work_keys_str_mv | AT heinzmichaelv testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth AT bhattacharyasukanya testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth AT trudeaubrianna testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth AT quistrachel testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth AT songseoho testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth AT leecamillam testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth AT jacobsonnicholasc testingdomainknowledgeandriskofbiasofalargescalegeneralartificialintelligencemodelinmentalhealth |