Cargando…

Instance Sequence Queries for Video Instance Segmentation with Transformers

Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constrain...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Zhujun, Vivet, Damien
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8271470/ https://www.ncbi.nlm.nih.gov/pubmed/34209420 http://dx.doi.org/10.3390/s21134507

_version_	1783721009805787136
author	Xu, Zhujun Vivet, Damien
author_facet	Xu, Zhujun Vivet, Damien
author_sort	Xu, Zhujun
collection	PubMed
description	Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset.
format	Online Article Text
id	pubmed-8271470
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-82714702021-07-11 Instance Sequence Queries for Video Instance Segmentation with Transformers Xu, Zhujun Vivet, Damien Sensors (Basel) Article Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset. MDPI 2021-06-30 /pmc/articles/PMC8271470/ /pubmed/34209420 http://dx.doi.org/10.3390/s21134507 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Xu, Zhujun Vivet, Damien Instance Sequence Queries for Video Instance Segmentation with Transformers
title	Instance Sequence Queries for Video Instance Segmentation with Transformers
title_full	Instance Sequence Queries for Video Instance Segmentation with Transformers
title_fullStr	Instance Sequence Queries for Video Instance Segmentation with Transformers
title_full_unstemmed	Instance Sequence Queries for Video Instance Segmentation with Transformers
title_short	Instance Sequence Queries for Video Instance Segmentation with Transformers
title_sort	instance sequence queries for video instance segmentation with transformers
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8271470/ https://www.ncbi.nlm.nih.gov/pubmed/34209420 http://dx.doi.org/10.3390/s21134507
work_keys_str_mv	AT xuzhujun instancesequencequeriesforvideoinstancesegmentationwithtransformers AT vivetdamien instancesequencequeriesforvideoinstancesegmentationwithtransformers

Instance Sequence Queries for Video Instance Segmentation with Transformers

Ejemplares similares