Cargando…

Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level

Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sun, Hao, Liu, Jiaqing, Chai, Shurong, Qiu, Zhaolin, Lin, Lanfen, Huang, Xinyin, Chen, Yenwei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309914/ https://www.ncbi.nlm.nih.gov/pubmed/34300504 http://dx.doi.org/10.3390/s21144764

_version_	1783728635455209472
author	Sun, Hao Liu, Jiaqing Chai, Shurong Qiu, Zhaolin Lin, Lanfen Huang, Xinyin Chen, Yenwei
author_facet	Sun, Hao Liu, Jiaqing Chai, Shurong Qiu, Zhaolin Lin, Lanfen Huang, Xinyin Chen, Yenwei
author_sort	Sun, Hao
collection	PubMed
description	Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist. For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or features effectively. In addition, how to include other information or tasks to enhance the estimation accuracy is also one of the challenges. In this study, we propose a multi-modal adaptive fusion transformer network for estimating the levels of depression. Transformer-based models have achieved state-of-the-art performance in language understanding and sequence modeling. Thus, the proposed transformer-based network is utilized to extract long-term temporal context information from uni-modal audio and visual data in our work. This is the first transformer-based approach for depression detection. We also propose an adaptive fusion method for adaptively fusing useful multi-modal features. Furthermore, inspired by current multi-task learning work, we also incorporate an auxiliary task (depression classification) to enhance the main task of depression level regression (estimation). The effectiveness of the proposed method has been validated on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge) in terms of the PHQ-8 scores. Experimental results indicate that the proposed method achieves better performance compared with currently state-of-the-art methods. Our proposed method achieves a concordance correlation coefficient (CCC) of 0.733 on AVEC 2019 which is 6.2% higher than the accuracy (CCC = 0.696) of the state-of-the-art method.
format	Online Article Text
id	pubmed-8309914
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-83099142021-07-25 Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level Sun, Hao Liu, Jiaqing Chai, Shurong Qiu, Zhaolin Lin, Lanfen Huang, Xinyin Chen, Yenwei Sensors (Basel) Article Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist. For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or features effectively. In addition, how to include other information or tasks to enhance the estimation accuracy is also one of the challenges. In this study, we propose a multi-modal adaptive fusion transformer network for estimating the levels of depression. Transformer-based models have achieved state-of-the-art performance in language understanding and sequence modeling. Thus, the proposed transformer-based network is utilized to extract long-term temporal context information from uni-modal audio and visual data in our work. This is the first transformer-based approach for depression detection. We also propose an adaptive fusion method for adaptively fusing useful multi-modal features. Furthermore, inspired by current multi-task learning work, we also incorporate an auxiliary task (depression classification) to enhance the main task of depression level regression (estimation). The effectiveness of the proposed method has been validated on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge) in terms of the PHQ-8 scores. Experimental results indicate that the proposed method achieves better performance compared with currently state-of-the-art methods. Our proposed method achieves a concordance correlation coefficient (CCC) of 0.733 on AVEC 2019 which is 6.2% higher than the accuracy (CCC = 0.696) of the state-of-the-art method. MDPI 2021-07-12 /pmc/articles/PMC8309914/ /pubmed/34300504 http://dx.doi.org/10.3390/s21144764 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Sun, Hao Liu, Jiaqing Chai, Shurong Qiu, Zhaolin Lin, Lanfen Huang, Xinyin Chen, Yenwei Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
title	Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
title_full	Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
title_fullStr	Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
title_full_unstemmed	Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
title_short	Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
title_sort	multi-modal adaptive fusion transformer network for the estimation of depression level
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309914/ https://www.ncbi.nlm.nih.gov/pubmed/34300504 http://dx.doi.org/10.3390/s21144764
work_keys_str_mv	AT sunhao multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel AT liujiaqing multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel AT chaishurong multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel AT qiuzhaolin multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel AT linlanfen multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel AT huangxinyin multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel AT chenyenwei multimodaladaptivefusiontransformernetworkfortheestimationofdepressionlevel

Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level

Ejemplares similares