Cargando…

Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation

Embodied PointGoal navigation is a fundamental task for embodied agents. Recent works have shown that the performance of the embodied navigation agent degrades significantly in the presence of visual corruption, including Spatter, Speckle Noise, and Defocus Blur, showing the weak robustness of the a...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Jie, Xu, Yangbin, Luo, Luqing, Liu, Haiyang, Lu, Kaiqiang, Liu, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10098958/
https://www.ncbi.nlm.nih.gov/pubmed/37050615
http://dx.doi.org/10.3390/s23073553
_version_ 1785024940441862144
author Peng, Jie
Xu, Yangbin
Luo, Luqing
Liu, Haiyang
Lu, Kaiqiang
Liu, Jian
author_facet Peng, Jie
Xu, Yangbin
Luo, Luqing
Liu, Haiyang
Lu, Kaiqiang
Liu, Jian
author_sort Peng, Jie
collection PubMed
description Embodied PointGoal navigation is a fundamental task for embodied agents. Recent works have shown that the performance of the embodied navigation agent degrades significantly in the presence of visual corruption, including Spatter, Speckle Noise, and Defocus Blur, showing the weak robustness of the agent. To improve the robustness of embodied navigation agents to various visual corruptions, we propose a navigation framework called Regularized Denoising Masked AutoEncoders Navigation (RDMAE-Nav). In a nutshell, RDMAE-Nav mainly consists of two modules: a visual module and a policy module. In the visual module, a self-supervised pretraining method, dubbed Regularized Denoising Masked AutoEncoders (RDMAE), is designed to enable the Vision Transformers (ViT)-based visual encoder to learn robust representations. The bidirectional Kullback–Leibler divergence is introduced in RDMAE as the regularization term for a denoising masked modeling task. Specifically, RDMAE mitigates the gap between clean and noisy image representations by minimizing the bidirectional Kullback–Leibler divergence. Then, the visual encoder is pretrained by RDMAE. In contrast to existing works, RDMAE-Nav applies denoising masked visual pretraining for PointGoal navigation to improve robustness to various visual corruptions. Finally, the pretrained visual encoder with frozen weights is applied to extract robust visual representations for policy learning in the RDMAE-Nav. Extensive experiments show that RDMAE-Nav performs competitively compared with state of the arts (SOTAs) on various visual corruptions. In detail, RDMAE-Nav performs the absolute improvement: 28.2% in SR and 23.68% in SPL under Spatter; 2.28% in SR and 6.41% in SPL under Speckle Noise; and 9.46% in SR and 9.55% in SPL under Defocus Blur.
format Online
Article
Text
id pubmed-10098958
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100989582023-04-14 Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation Peng, Jie Xu, Yangbin Luo, Luqing Liu, Haiyang Lu, Kaiqiang Liu, Jian Sensors (Basel) Article Embodied PointGoal navigation is a fundamental task for embodied agents. Recent works have shown that the performance of the embodied navigation agent degrades significantly in the presence of visual corruption, including Spatter, Speckle Noise, and Defocus Blur, showing the weak robustness of the agent. To improve the robustness of embodied navigation agents to various visual corruptions, we propose a navigation framework called Regularized Denoising Masked AutoEncoders Navigation (RDMAE-Nav). In a nutshell, RDMAE-Nav mainly consists of two modules: a visual module and a policy module. In the visual module, a self-supervised pretraining method, dubbed Regularized Denoising Masked AutoEncoders (RDMAE), is designed to enable the Vision Transformers (ViT)-based visual encoder to learn robust representations. The bidirectional Kullback–Leibler divergence is introduced in RDMAE as the regularization term for a denoising masked modeling task. Specifically, RDMAE mitigates the gap between clean and noisy image representations by minimizing the bidirectional Kullback–Leibler divergence. Then, the visual encoder is pretrained by RDMAE. In contrast to existing works, RDMAE-Nav applies denoising masked visual pretraining for PointGoal navigation to improve robustness to various visual corruptions. Finally, the pretrained visual encoder with frozen weights is applied to extract robust visual representations for policy learning in the RDMAE-Nav. Extensive experiments show that RDMAE-Nav performs competitively compared with state of the arts (SOTAs) on various visual corruptions. In detail, RDMAE-Nav performs the absolute improvement: 28.2% in SR and 23.68% in SPL under Spatter; 2.28% in SR and 6.41% in SPL under Speckle Noise; and 9.46% in SR and 9.55% in SPL under Defocus Blur. MDPI 2023-03-28 /pmc/articles/PMC10098958/ /pubmed/37050615 http://dx.doi.org/10.3390/s23073553 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Peng, Jie
Xu, Yangbin
Luo, Luqing
Liu, Haiyang
Lu, Kaiqiang
Liu, Jian
Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
title Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
title_full Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
title_fullStr Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
title_full_unstemmed Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
title_short Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
title_sort regularized denoising masked visual pretraining for robust embodied pointgoal navigation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10098958/
https://www.ncbi.nlm.nih.gov/pubmed/37050615
http://dx.doi.org/10.3390/s23073553
work_keys_str_mv AT pengjie regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation
AT xuyangbin regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation
AT luoluqing regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation
AT liuhaiyang regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation
AT lukaiqiang regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation
AT liujian regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation