Cargando…

Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation

BACKGROUND: Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive...

Descripción completa

Detalles Bibliográficos
Autores principales: Owen, David, Antypas, Dimosthenis, Hassoulas, Athanasios, Pardiñas, Antonio F, Espinosa-Anke, Luis, Collados, Jose Camacho
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614849/
https://www.ncbi.nlm.nih.gov/pubmed/37525646
http://dx.doi.org/10.2196/41205
_version_ 1783605657847463936
author Owen, David
Antypas, Dimosthenis
Hassoulas, Athanasios
Pardiñas, Antonio F
Espinosa-Anke, Luis
Collados, Jose Camacho
author_facet Owen, David
Antypas, Dimosthenis
Hassoulas, Athanasios
Pardiñas, Antonio F
Espinosa-Anke, Luis
Collados, Jose Camacho
author_sort Owen, David
collection PubMed
description BACKGROUND: Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums. OBJECTIVE: The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection. METHODS: A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs). RESULTS: Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F(1)-score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user’s estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history. CONCLUSIONS: Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition.
format Online
Article
Text
id pubmed-7614849
institution National Center for Biotechnology Information
language English
publishDate 2023
record_format MEDLINE/PubMed
spelling pubmed-76148492023-07-31 Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation Owen, David Antypas, Dimosthenis Hassoulas, Athanasios Pardiñas, Antonio F Espinosa-Anke, Luis Collados, Jose Camacho JMIR AI Article BACKGROUND: Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums. OBJECTIVE: The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection. METHODS: A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs). RESULTS: Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F(1)-score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user’s estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history. CONCLUSIONS: Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition. 2023-03-24 /pmc/articles/PMC7614849/ /pubmed/37525646 http://dx.doi.org/10.2196/41205 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) International license. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the original publication on https://www.ai.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Article
Owen, David
Antypas, Dimosthenis
Hassoulas, Athanasios
Pardiñas, Antonio F
Espinosa-Anke, Luis
Collados, Jose Camacho
Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation
title Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation
title_full Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation
title_fullStr Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation
title_full_unstemmed Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation
title_short Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation
title_sort enabling early health care intervention by detecting depression in users of web-based forums using language models: longitudinal analysis and evaluation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614849/
https://www.ncbi.nlm.nih.gov/pubmed/37525646
http://dx.doi.org/10.2196/41205
work_keys_str_mv AT owendavid enablingearlyhealthcareinterventionbydetectingdepressioninusersofwebbasedforumsusinglanguagemodelslongitudinalanalysisandevaluation
AT antypasdimosthenis enablingearlyhealthcareinterventionbydetectingdepressioninusersofwebbasedforumsusinglanguagemodelslongitudinalanalysisandevaluation
AT hassoulasathanasios enablingearlyhealthcareinterventionbydetectingdepressioninusersofwebbasedforumsusinglanguagemodelslongitudinalanalysisandevaluation
AT pardinasantoniof enablingearlyhealthcareinterventionbydetectingdepressioninusersofwebbasedforumsusinglanguagemodelslongitudinalanalysisandevaluation
AT espinosaankeluis enablingearlyhealthcareinterventionbydetectingdepressioninusersofwebbasedforumsusinglanguagemodelslongitudinalanalysisandevaluation
AT colladosjosecamacho enablingearlyhealthcareinterventionbydetectingdepressioninusersofwebbasedforumsusinglanguagemodelslongitudinalanalysisandevaluation