Cargando…

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech

Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Yan, Liang, Zhenlin, Du, Jing, Zhang, Li, Liu, Chengyu, Zhao, Li
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8426553/ https://www.ncbi.nlm.nih.gov/pubmed/34512301 http://dx.doi.org/10.3389/fnbot.2021.684037

_version_	1783750065424171008
author	Zhao, Yan Liang, Zhenlin Du, Jing Zhang, Li Liu, Chengyu Zhao, Li
author_facet	Zhao, Yan Liang, Zhenlin Du, Jing Zhang, Li Liu, Chengyu Zhao, Li
author_sort	Zhao, Yan
collection	PubMed
description	Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.
format	Online Article Text
id	pubmed-8426553
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-84265532021-09-10 Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech Zhao, Yan Liang, Zhenlin Du, Jing Zhang, Li Liu, Chengyu Zhao, Li Front Neurorobot Neuroscience Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively. Frontiers Media S.A. 2021-08-26 /pmc/articles/PMC8426553/ /pubmed/34512301 http://dx.doi.org/10.3389/fnbot.2021.684037 Text en Copyright © 2021 Zhao, Liang, Du, Zhang, Liu and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Zhao, Yan Liang, Zhenlin Du, Jing Zhang, Li Liu, Chengyu Zhao, Li Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
title	Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
title_full	Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
title_fullStr	Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
title_full_unstemmed	Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
title_short	Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech
title_sort	multi-head attention-based long short-term memory for depression detection from speech
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8426553/ https://www.ncbi.nlm.nih.gov/pubmed/34512301 http://dx.doi.org/10.3389/fnbot.2021.684037
work_keys_str_mv	AT zhaoyan multiheadattentionbasedlongshorttermmemoryfordepressiondetectionfromspeech AT liangzhenlin multiheadattentionbasedlongshorttermmemoryfordepressiondetectionfromspeech AT dujing multiheadattentionbasedlongshorttermmemoryfordepressiondetectionfromspeech AT zhangli multiheadattentionbasedlongshorttermmemoryfordepressiondetectionfromspeech AT liuchengyu multiheadattentionbasedlongshorttermmemoryfordepressiondetectionfromspeech AT zhaoli multiheadattentionbasedlongshorttermmemoryfordepressiondetectionfromspeech

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech

Ejemplares similares