Cargando…

Surrounding-aware representation prediction in Birds-Eye-View using transformers

Birds-Eye-View (BEV) maps provide an accurate representation of sensory cues present in the surroundings, including dynamic and static elements. Generating a semantic representation of BEV maps can be a challenging task since it relies on object detection and image segmentation. Recent studies have...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Jiahui, Zheng, Wenli, Chen, Yongquan, Zhang, Yutong, Huang, Rui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352774/ https://www.ncbi.nlm.nih.gov/pubmed/37469840 http://dx.doi.org/10.3389/fnins.2023.1219363

_version_	1785074582711959552
author	Yu, Jiahui Zheng, Wenli Chen, Yongquan Zhang, Yutong Huang, Rui
author_facet	Yu, Jiahui Zheng, Wenli Chen, Yongquan Zhang, Yutong Huang, Rui
author_sort	Yu, Jiahui
collection	PubMed
description	Birds-Eye-View (BEV) maps provide an accurate representation of sensory cues present in the surroundings, including dynamic and static elements. Generating a semantic representation of BEV maps can be a challenging task since it relies on object detection and image segmentation. Recent studies have developed Convolutional Neural networks (CNNs) to tackle the underlying challenge. However, current CNN-based models encounter a bottleneck in perceiving subtle nuances of information due to their limited capacity, which constrains the efficiency and accuracy of representation prediction, especially for multi-scale and multi-class elements. To address this issue, we propose novel neural networks for BEV semantic representation prediction that are built upon Transformers without convolution layers in a significantly different way from existing pure CNNs and hybrid architectures that merge CNNs and Transformers. Given a sequence of image frames as input, the proposed neural networks can directly output the BEV maps with per-class probabilities in end-to-end forecasting. The core innovations of the current study contain (1) a new pixel generation method powered by Transformers, (2) a novel algorithm for image-to-BEV transformation, and (3) a novel network for image feature extraction using attention mechanisms. We evaluate the proposed Models performance on two challenging benchmarks, the NuScenes dataset and the Argoverse 3D dataset, and compare it with state-of-the-art methods. Results show that the proposed model outperforms CNNs, achieving a relative improvement of 2.4 and 5.2% on the NuScenes and Argoverse 3D datasets, respectively.
format	Online Article Text
id	pubmed-10352774
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-103527742023-07-19 Surrounding-aware representation prediction in Birds-Eye-View using transformers Yu, Jiahui Zheng, Wenli Chen, Yongquan Zhang, Yutong Huang, Rui Front Neurosci Neuroscience Birds-Eye-View (BEV) maps provide an accurate representation of sensory cues present in the surroundings, including dynamic and static elements. Generating a semantic representation of BEV maps can be a challenging task since it relies on object detection and image segmentation. Recent studies have developed Convolutional Neural networks (CNNs) to tackle the underlying challenge. However, current CNN-based models encounter a bottleneck in perceiving subtle nuances of information due to their limited capacity, which constrains the efficiency and accuracy of representation prediction, especially for multi-scale and multi-class elements. To address this issue, we propose novel neural networks for BEV semantic representation prediction that are built upon Transformers without convolution layers in a significantly different way from existing pure CNNs and hybrid architectures that merge CNNs and Transformers. Given a sequence of image frames as input, the proposed neural networks can directly output the BEV maps with per-class probabilities in end-to-end forecasting. The core innovations of the current study contain (1) a new pixel generation method powered by Transformers, (2) a novel algorithm for image-to-BEV transformation, and (3) a novel network for image feature extraction using attention mechanisms. We evaluate the proposed Models performance on two challenging benchmarks, the NuScenes dataset and the Argoverse 3D dataset, and compare it with state-of-the-art methods. Results show that the proposed model outperforms CNNs, achieving a relative improvement of 2.4 and 5.2% on the NuScenes and Argoverse 3D datasets, respectively. Frontiers Media S.A. 2023-07-04 /pmc/articles/PMC10352774/ /pubmed/37469840 http://dx.doi.org/10.3389/fnins.2023.1219363 Text en Copyright © 2023 Yu, Zheng, Chen, Zhang and Huang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Yu, Jiahui Zheng, Wenli Chen, Yongquan Zhang, Yutong Huang, Rui Surrounding-aware representation prediction in Birds-Eye-View using transformers
title	Surrounding-aware representation prediction in Birds-Eye-View using transformers
title_full	Surrounding-aware representation prediction in Birds-Eye-View using transformers
title_fullStr	Surrounding-aware representation prediction in Birds-Eye-View using transformers
title_full_unstemmed	Surrounding-aware representation prediction in Birds-Eye-View using transformers
title_short	Surrounding-aware representation prediction in Birds-Eye-View using transformers
title_sort	surrounding-aware representation prediction in birds-eye-view using transformers
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352774/ https://www.ncbi.nlm.nih.gov/pubmed/37469840 http://dx.doi.org/10.3389/fnins.2023.1219363
work_keys_str_mv	AT yujiahui surroundingawarerepresentationpredictioninbirdseyeviewusingtransformers AT zhengwenli surroundingawarerepresentationpredictioninbirdseyeviewusingtransformers AT chenyongquan surroundingawarerepresentationpredictioninbirdseyeviewusingtransformers AT zhangyutong surroundingawarerepresentationpredictioninbirdseyeviewusingtransformers AT huangrui surroundingawarerepresentationpredictioninbirdseyeviewusingtransformers

Surrounding-aware representation prediction in Birds-Eye-View using transformers

Ejemplares similares