Cargando…

Depression Speech Recognition With a Three-Dimensional Convolutional Network

Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early d...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hongbo, Liu, Yu, Zhen, Xiaoxiao, Tu, Xuyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8514878/
https://www.ncbi.nlm.nih.gov/pubmed/34658815
http://dx.doi.org/10.3389/fnhum.2021.713823
Descripción
Sumario:Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early diagnosis and treatment of patients. Because the abnormal speech features of patients with depression are related to their mental state to some extent, it is valuable to use speech acoustic features as objective indicators for the diagnosis of depression. In order to solve the problem of the complexity of speech in depression and the limited performance of traditional feature extraction methods for speech signals, this article suggests a Three-Dimensional Convolutional filter bank with Highway Networks and Bidirectional GRU (Gated Recurrent Unit) with an Attention mechanism (in short 3D-CBHGA), which includes two key strategies. (1) The three-dimensional feature extraction of the speech signal can timely realize the expression ability of those depression signals. (2) Based on the attention mechanism in the GRU network, the frame-level vector is weighted to get the hidden emotion vector by self-learning. Experiments show that the proposed 3D-CBHGA can well establish mapping from speech signals to depression-related features and improve the accuracy of depression detection in speech signals.