Cargando…

Depression Speech Recognition With a Three-Dimensional Convolutional Network

Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early d...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hongbo, Liu, Yu, Zhen, Xiaoxiao, Tu, Xuyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8514878/
https://www.ncbi.nlm.nih.gov/pubmed/34658815
http://dx.doi.org/10.3389/fnhum.2021.713823
_version_ 1784583492243292160
author Wang, Hongbo
Liu, Yu
Zhen, Xiaoxiao
Tu, Xuyan
author_facet Wang, Hongbo
Liu, Yu
Zhen, Xiaoxiao
Tu, Xuyan
author_sort Wang, Hongbo
collection PubMed
description Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early diagnosis and treatment of patients. Because the abnormal speech features of patients with depression are related to their mental state to some extent, it is valuable to use speech acoustic features as objective indicators for the diagnosis of depression. In order to solve the problem of the complexity of speech in depression and the limited performance of traditional feature extraction methods for speech signals, this article suggests a Three-Dimensional Convolutional filter bank with Highway Networks and Bidirectional GRU (Gated Recurrent Unit) with an Attention mechanism (in short 3D-CBHGA), which includes two key strategies. (1) The three-dimensional feature extraction of the speech signal can timely realize the expression ability of those depression signals. (2) Based on the attention mechanism in the GRU network, the frame-level vector is weighted to get the hidden emotion vector by self-learning. Experiments show that the proposed 3D-CBHGA can well establish mapping from speech signals to depression-related features and improve the accuracy of depression detection in speech signals.
format Online
Article
Text
id pubmed-8514878
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85148782021-10-15 Depression Speech Recognition With a Three-Dimensional Convolutional Network Wang, Hongbo Liu, Yu Zhen, Xiaoxiao Tu, Xuyan Front Hum Neurosci Human Neuroscience Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early diagnosis and treatment of patients. Because the abnormal speech features of patients with depression are related to their mental state to some extent, it is valuable to use speech acoustic features as objective indicators for the diagnosis of depression. In order to solve the problem of the complexity of speech in depression and the limited performance of traditional feature extraction methods for speech signals, this article suggests a Three-Dimensional Convolutional filter bank with Highway Networks and Bidirectional GRU (Gated Recurrent Unit) with an Attention mechanism (in short 3D-CBHGA), which includes two key strategies. (1) The three-dimensional feature extraction of the speech signal can timely realize the expression ability of those depression signals. (2) Based on the attention mechanism in the GRU network, the frame-level vector is weighted to get the hidden emotion vector by self-learning. Experiments show that the proposed 3D-CBHGA can well establish mapping from speech signals to depression-related features and improve the accuracy of depression detection in speech signals. Frontiers Media S.A. 2021-09-30 /pmc/articles/PMC8514878/ /pubmed/34658815 http://dx.doi.org/10.3389/fnhum.2021.713823 Text en Copyright © 2021 Wang, Liu, Zhen and Tu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Human Neuroscience
Wang, Hongbo
Liu, Yu
Zhen, Xiaoxiao
Tu, Xuyan
Depression Speech Recognition With a Three-Dimensional Convolutional Network
title Depression Speech Recognition With a Three-Dimensional Convolutional Network
title_full Depression Speech Recognition With a Three-Dimensional Convolutional Network
title_fullStr Depression Speech Recognition With a Three-Dimensional Convolutional Network
title_full_unstemmed Depression Speech Recognition With a Three-Dimensional Convolutional Network
title_short Depression Speech Recognition With a Three-Dimensional Convolutional Network
title_sort depression speech recognition with a three-dimensional convolutional network
topic Human Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8514878/
https://www.ncbi.nlm.nih.gov/pubmed/34658815
http://dx.doi.org/10.3389/fnhum.2021.713823
work_keys_str_mv AT wanghongbo depressionspeechrecognitionwithathreedimensionalconvolutionalnetwork
AT liuyu depressionspeechrecognitionwithathreedimensionalconvolutionalnetwork
AT zhenxiaoxiao depressionspeechrecognitionwithathreedimensionalconvolutionalnetwork
AT tuxuyan depressionspeechrecognitionwithathreedimensionalconvolutionalnetwork