Cargando…

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study

BACKGROUND: ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT’s practical capabilities, particularly regarding suicide prevention, have not yet...

Descripción completa

Detalles Bibliográficos
Autores principales: Levkovich, Inbar, Elyoseph, Zohar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10551796/
https://www.ncbi.nlm.nih.gov/pubmed/37728984
http://dx.doi.org/10.2196/51232
_version_ 1785115846384812032
author Levkovich, Inbar
Elyoseph, Zohar
author_facet Levkovich, Inbar
Elyoseph, Zohar
author_sort Levkovich, Inbar
collection PubMed
description BACKGROUND: ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT’s practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. OBJECTIVE: The study’s aim was to evaluate ChatGPT’s ability to assess suicide risk, taking into consideration 2 discernable factors—perceived burdensomeness and thwarted belongingness—over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. METHODS: ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4’s proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). RESULTS: During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of –0.83). The empirical evidence suggests that ChatGPT-4’s evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of –0.89 and –0.90, respectively). CONCLUSIONS: The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4’s potential to support gatekeepers, patients, and even mental health professionals’ decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4’s capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one’s actual suicide risk level.
format Online
Article
Text
id pubmed-10551796
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-105517962023-10-06 Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study Levkovich, Inbar Elyoseph, Zohar JMIR Ment Health Original Paper BACKGROUND: ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT’s practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. OBJECTIVE: The study’s aim was to evaluate ChatGPT’s ability to assess suicide risk, taking into consideration 2 discernable factors—perceived burdensomeness and thwarted belongingness—over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. METHODS: ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4’s proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). RESULTS: During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of –0.83). The empirical evidence suggests that ChatGPT-4’s evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of –0.89 and –0.90, respectively). CONCLUSIONS: The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4’s potential to support gatekeepers, patients, and even mental health professionals’ decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4’s capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one’s actual suicide risk level. JMIR Publications 2023-09-20 /pmc/articles/PMC10551796/ /pubmed/37728984 http://dx.doi.org/10.2196/51232 Text en ©Inbar Levkovich, Zohar Elyoseph. Originally published in JMIR Mental Health (https://mental.jmir.org), 20.09.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on https://mental.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Levkovich, Inbar
Elyoseph, Zohar
Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study
title Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study
title_full Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study
title_fullStr Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study
title_full_unstemmed Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study
title_short Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study
title_sort suicide risk assessments through the eyes of chatgpt-3.5 versus chatgpt-4: vignette study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10551796/
https://www.ncbi.nlm.nih.gov/pubmed/37728984
http://dx.doi.org/10.2196/51232
work_keys_str_mv AT levkovichinbar suicideriskassessmentsthroughtheeyesofchatgpt35versuschatgpt4vignettestudy
AT elyosephzohar suicideriskassessmentsthroughtheeyesofchatgpt35versuschatgpt4vignettestudy