Cargando…

Outdoor Vision-and-Language Navigation Needs Object-Level Alignment

In the field of embodied AI, vision-and-language navigation (VLN) is a crucial and challenging multi-modal task. Specifically, outdoor VLN involves an agent navigating within a graph-based environment, while simultaneously interpreting information from real-world urban environments and natural langu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sun, Yanjun, Qiu, Yue, Aoki, Yoshimitsu, Kataoka, Hirokatsu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346337/ https://www.ncbi.nlm.nih.gov/pubmed/37447877 http://dx.doi.org/10.3390/s23136028

_version_	1785073291604525056
author	Sun, Yanjun Qiu, Yue Aoki, Yoshimitsu Kataoka, Hirokatsu
author_facet	Sun, Yanjun Qiu, Yue Aoki, Yoshimitsu Kataoka, Hirokatsu
author_sort	Sun, Yanjun
collection	PubMed
description	In the field of embodied AI, vision-and-language navigation (VLN) is a crucial and challenging multi-modal task. Specifically, outdoor VLN involves an agent navigating within a graph-based environment, while simultaneously interpreting information from real-world urban environments and natural language instructions. Existing outdoor VLN models predict actions using a combination of panorama and instruction features. However, these methods may cause the agent to struggle to understand complicated outdoor environments and ignore the details in the environments to fail to navigate. Human navigation often involves the use of specific objects as reference landmarks when navigating to unfamiliar places, providing a more rational and efficient approach to navigation. Inspired by this natural human behavior, we propose an object-level alignment module (OAlM), which guides the agent to focus more on object tokens mentioned in the instructions and recognize these landmarks during navigation. By treating these landmarks as sub-goals, our method effectively decomposes a long-range path into a series of shorter paths, ultimately improving the agent’s overall performance. In addition to enabling better object recognition and alignment, our proposed OAlM also fosters a more robust and adaptable agent capable of navigating complex environments. This adaptability is particularly crucial for real-world applications where environmental conditions can be unpredictable and varied. Experimental results show our OAlM is a more object-focused model, and our approach outperforms all metrics on a challenging outdoor VLN Touchdown dataset, exceeding the baseline by 3.19% on task completion (TC). These results highlight the potential of leveraging object-level information in the form of sub-goals to improve navigation performance in embodied AI systems, paving the way for more advanced and efficient outdoor navigation.
format	Online Article Text
id	pubmed-10346337
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103463372023-07-15 Outdoor Vision-and-Language Navigation Needs Object-Level Alignment Sun, Yanjun Qiu, Yue Aoki, Yoshimitsu Kataoka, Hirokatsu Sensors (Basel) Article In the field of embodied AI, vision-and-language navigation (VLN) is a crucial and challenging multi-modal task. Specifically, outdoor VLN involves an agent navigating within a graph-based environment, while simultaneously interpreting information from real-world urban environments and natural language instructions. Existing outdoor VLN models predict actions using a combination of panorama and instruction features. However, these methods may cause the agent to struggle to understand complicated outdoor environments and ignore the details in the environments to fail to navigate. Human navigation often involves the use of specific objects as reference landmarks when navigating to unfamiliar places, providing a more rational and efficient approach to navigation. Inspired by this natural human behavior, we propose an object-level alignment module (OAlM), which guides the agent to focus more on object tokens mentioned in the instructions and recognize these landmarks during navigation. By treating these landmarks as sub-goals, our method effectively decomposes a long-range path into a series of shorter paths, ultimately improving the agent’s overall performance. In addition to enabling better object recognition and alignment, our proposed OAlM also fosters a more robust and adaptable agent capable of navigating complex environments. This adaptability is particularly crucial for real-world applications where environmental conditions can be unpredictable and varied. Experimental results show our OAlM is a more object-focused model, and our approach outperforms all metrics on a challenging outdoor VLN Touchdown dataset, exceeding the baseline by 3.19% on task completion (TC). These results highlight the potential of leveraging object-level information in the form of sub-goals to improve navigation performance in embodied AI systems, paving the way for more advanced and efficient outdoor navigation. MDPI 2023-06-29 /pmc/articles/PMC10346337/ /pubmed/37447877 http://dx.doi.org/10.3390/s23136028 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Sun, Yanjun Qiu, Yue Aoki, Yoshimitsu Kataoka, Hirokatsu Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
title	Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
title_full	Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
title_fullStr	Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
title_full_unstemmed	Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
title_short	Outdoor Vision-and-Language Navigation Needs Object-Level Alignment
title_sort	outdoor vision-and-language navigation needs object-level alignment
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346337/ https://www.ncbi.nlm.nih.gov/pubmed/37447877 http://dx.doi.org/10.3390/s23136028
work_keys_str_mv	AT sunyanjun outdoorvisionandlanguagenavigationneedsobjectlevelalignment AT qiuyue outdoorvisionandlanguagenavigationneedsobjectlevelalignment AT aokiyoshimitsu outdoorvisionandlanguagenavigationneedsobjectlevelalignment AT kataokahirokatsu outdoorvisionandlanguagenavigationneedsobjectlevelalignment

Outdoor Vision-and-Language Navigation Needs Object-Level Alignment

Ejemplares similares