Cargando…

Fast Panoptic Segmentation with Soft Attention Embeddings

Panoptic segmentation provides a rich 2D environment representation by unifying semantic and instance segmentation. Most current state-of-the-art panoptic segmentation methods are built upon two-stage detectors and are not suitable for real-time applications, such as automated driving, due to their...

Descripción completa

Detalles Bibliográficos
Autores principales: Petrovai, Andra, Nedevschi, Sergiu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8837929/
https://www.ncbi.nlm.nih.gov/pubmed/35161529
http://dx.doi.org/10.3390/s22030783
Descripción
Sumario:Panoptic segmentation provides a rich 2D environment representation by unifying semantic and instance segmentation. Most current state-of-the-art panoptic segmentation methods are built upon two-stage detectors and are not suitable for real-time applications, such as automated driving, due to their high computational complexity. In this work, we introduce a novel, fast and accurate single-stage panoptic segmentation network that employs a shared feature extraction backbone and three network heads for object detection, semantic segmentation, instance-level attention masks. Guided by object detections, our new panoptic segmentation head learns instance specific soft attention masks based on spatial embeddings. The semantic masks for stuff classes and soft instance masks for things classes are pixel-wise coherent and can be easily integrated in a panoptic output. The training and inference pipelines are simplified and no post-processing of the panoptic output is necessary. Benefiting from fast inference speed, the network can be deployed in automated vehicles or robotic applications. We perform extensive experiments on COCO and Cityscapes datasets and obtain competitive results in both accuracy and time. On the Cityscapes dataset we achieve 59.7 panoptic quality with an inference speed of more than 10 FPS on high resolution 1024 × 2048 images.