Cargando…

Instance Sequence Queries for Video Instance Segmentation with Transformers

Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constrain...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Zhujun, Vivet, Damien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8271470/
https://www.ncbi.nlm.nih.gov/pubmed/34209420
http://dx.doi.org/10.3390/s21134507
_version_ 1783721009805787136
author Xu, Zhujun
Vivet, Damien
author_facet Xu, Zhujun
Vivet, Damien
author_sort Xu, Zhujun
collection PubMed
description Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset.
format Online
Article
Text
id pubmed-8271470
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-82714702021-07-11 Instance Sequence Queries for Video Instance Segmentation with Transformers Xu, Zhujun Vivet, Damien Sensors (Basel) Article Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset. MDPI 2021-06-30 /pmc/articles/PMC8271470/ /pubmed/34209420 http://dx.doi.org/10.3390/s21134507 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xu, Zhujun
Vivet, Damien
Instance Sequence Queries for Video Instance Segmentation with Transformers
title Instance Sequence Queries for Video Instance Segmentation with Transformers
title_full Instance Sequence Queries for Video Instance Segmentation with Transformers
title_fullStr Instance Sequence Queries for Video Instance Segmentation with Transformers
title_full_unstemmed Instance Sequence Queries for Video Instance Segmentation with Transformers
title_short Instance Sequence Queries for Video Instance Segmentation with Transformers
title_sort instance sequence queries for video instance segmentation with transformers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8271470/
https://www.ncbi.nlm.nih.gov/pubmed/34209420
http://dx.doi.org/10.3390/s21134507
work_keys_str_mv AT xuzhujun instancesequencequeriesforvideoinstancesegmentationwithtransformers
AT vivetdamien instancesequencequeriesforvideoinstancesegmentationwithtransformers