Cargando…
Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation
Embodied PointGoal navigation is a fundamental task for embodied agents. Recent works have shown that the performance of the embodied navigation agent degrades significantly in the presence of visual corruption, including Spatter, Speckle Noise, and Defocus Blur, showing the weak robustness of the a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10098958/ https://www.ncbi.nlm.nih.gov/pubmed/37050615 http://dx.doi.org/10.3390/s23073553 |
_version_ | 1785024940441862144 |
---|---|
author | Peng, Jie Xu, Yangbin Luo, Luqing Liu, Haiyang Lu, Kaiqiang Liu, Jian |
author_facet | Peng, Jie Xu, Yangbin Luo, Luqing Liu, Haiyang Lu, Kaiqiang Liu, Jian |
author_sort | Peng, Jie |
collection | PubMed |
description | Embodied PointGoal navigation is a fundamental task for embodied agents. Recent works have shown that the performance of the embodied navigation agent degrades significantly in the presence of visual corruption, including Spatter, Speckle Noise, and Defocus Blur, showing the weak robustness of the agent. To improve the robustness of embodied navigation agents to various visual corruptions, we propose a navigation framework called Regularized Denoising Masked AutoEncoders Navigation (RDMAE-Nav). In a nutshell, RDMAE-Nav mainly consists of two modules: a visual module and a policy module. In the visual module, a self-supervised pretraining method, dubbed Regularized Denoising Masked AutoEncoders (RDMAE), is designed to enable the Vision Transformers (ViT)-based visual encoder to learn robust representations. The bidirectional Kullback–Leibler divergence is introduced in RDMAE as the regularization term for a denoising masked modeling task. Specifically, RDMAE mitigates the gap between clean and noisy image representations by minimizing the bidirectional Kullback–Leibler divergence. Then, the visual encoder is pretrained by RDMAE. In contrast to existing works, RDMAE-Nav applies denoising masked visual pretraining for PointGoal navigation to improve robustness to various visual corruptions. Finally, the pretrained visual encoder with frozen weights is applied to extract robust visual representations for policy learning in the RDMAE-Nav. Extensive experiments show that RDMAE-Nav performs competitively compared with state of the arts (SOTAs) on various visual corruptions. In detail, RDMAE-Nav performs the absolute improvement: 28.2% in SR and 23.68% in SPL under Spatter; 2.28% in SR and 6.41% in SPL under Speckle Noise; and 9.46% in SR and 9.55% in SPL under Defocus Blur. |
format | Online Article Text |
id | pubmed-10098958 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100989582023-04-14 Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation Peng, Jie Xu, Yangbin Luo, Luqing Liu, Haiyang Lu, Kaiqiang Liu, Jian Sensors (Basel) Article Embodied PointGoal navigation is a fundamental task for embodied agents. Recent works have shown that the performance of the embodied navigation agent degrades significantly in the presence of visual corruption, including Spatter, Speckle Noise, and Defocus Blur, showing the weak robustness of the agent. To improve the robustness of embodied navigation agents to various visual corruptions, we propose a navigation framework called Regularized Denoising Masked AutoEncoders Navigation (RDMAE-Nav). In a nutshell, RDMAE-Nav mainly consists of two modules: a visual module and a policy module. In the visual module, a self-supervised pretraining method, dubbed Regularized Denoising Masked AutoEncoders (RDMAE), is designed to enable the Vision Transformers (ViT)-based visual encoder to learn robust representations. The bidirectional Kullback–Leibler divergence is introduced in RDMAE as the regularization term for a denoising masked modeling task. Specifically, RDMAE mitigates the gap between clean and noisy image representations by minimizing the bidirectional Kullback–Leibler divergence. Then, the visual encoder is pretrained by RDMAE. In contrast to existing works, RDMAE-Nav applies denoising masked visual pretraining for PointGoal navigation to improve robustness to various visual corruptions. Finally, the pretrained visual encoder with frozen weights is applied to extract robust visual representations for policy learning in the RDMAE-Nav. Extensive experiments show that RDMAE-Nav performs competitively compared with state of the arts (SOTAs) on various visual corruptions. In detail, RDMAE-Nav performs the absolute improvement: 28.2% in SR and 23.68% in SPL under Spatter; 2.28% in SR and 6.41% in SPL under Speckle Noise; and 9.46% in SR and 9.55% in SPL under Defocus Blur. MDPI 2023-03-28 /pmc/articles/PMC10098958/ /pubmed/37050615 http://dx.doi.org/10.3390/s23073553 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Peng, Jie Xu, Yangbin Luo, Luqing Liu, Haiyang Lu, Kaiqiang Liu, Jian Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation |
title | Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation |
title_full | Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation |
title_fullStr | Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation |
title_full_unstemmed | Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation |
title_short | Regularized Denoising Masked Visual Pretraining for Robust Embodied PointGoal Navigation |
title_sort | regularized denoising masked visual pretraining for robust embodied pointgoal navigation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10098958/ https://www.ncbi.nlm.nih.gov/pubmed/37050615 http://dx.doi.org/10.3390/s23073553 |
work_keys_str_mv | AT pengjie regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation AT xuyangbin regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation AT luoluqing regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation AT liuhaiyang regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation AT lukaiqiang regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation AT liujian regularizeddenoisingmaskedvisualpretrainingforrobustembodiedpointgoalnavigation |