Cargando…

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jung, Gayoung, Lee, Jonghun, Kim, Incheol
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8124611/ https://www.ncbi.nlm.nih.gov/pubmed/34063299 http://dx.doi.org/10.3390/s21093164

_version_	1783693258097950720
author	Jung, Gayoung Lee, Jonghun Kim, Incheol
author_facet	Jung, Gayoung Lee, Jonghun Kim, Incheol
author_sort	Jung, Gayoung
collection	PubMed
description	Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.
format	Online Article Text
id	pubmed-8124611
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-81246112021-05-17 Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation Jung, Gayoung Lee, Jonghun Kim, Incheol Sensors (Basel) Article Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD. MDPI 2021-05-02 /pmc/articles/PMC8124611/ /pubmed/34063299 http://dx.doi.org/10.3390/s21093164 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Jung, Gayoung Lee, Jonghun Kim, Incheol Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title	Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_full	Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_fullStr	Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_full_unstemmed	Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_short	Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_sort	tracklet pair proposal and context reasoning for video scene graph generation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8124611/ https://www.ncbi.nlm.nih.gov/pubmed/34063299 http://dx.doi.org/10.3390/s21093164
work_keys_str_mv	AT junggayoung trackletpairproposalandcontextreasoningforvideoscenegraphgeneration AT leejonghun trackletpairproposalandcontextreasoningforvideoscenegraphgeneration AT kimincheol trackletpairproposalandcontextreasoningforvideoscenegraphgeneration

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Ejemplares similares