Cargando…

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study...

Descripción completa

Detalles Bibliográficos
Autores principales: Jung, Gayoung, Lee, Jonghun, Kim, Incheol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8124611/
https://www.ncbi.nlm.nih.gov/pubmed/34063299
http://dx.doi.org/10.3390/s21093164
_version_ 1783693258097950720
author Jung, Gayoung
Lee, Jonghun
Kim, Incheol
author_facet Jung, Gayoung
Lee, Jonghun
Kim, Incheol
author_sort Jung, Gayoung
collection PubMed
description Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.
format Online
Article
Text
id pubmed-8124611
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-81246112021-05-17 Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation Jung, Gayoung Lee, Jonghun Kim, Incheol Sensors (Basel) Article Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD. MDPI 2021-05-02 /pmc/articles/PMC8124611/ /pubmed/34063299 http://dx.doi.org/10.3390/s21093164 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Jung, Gayoung
Lee, Jonghun
Kim, Incheol
Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_full Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_fullStr Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_full_unstemmed Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_short Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
title_sort tracklet pair proposal and context reasoning for video scene graph generation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8124611/
https://www.ncbi.nlm.nih.gov/pubmed/34063299
http://dx.doi.org/10.3390/s21093164
work_keys_str_mv AT junggayoung trackletpairproposalandcontextreasoningforvideoscenegraphgeneration
AT leejonghun trackletpairproposalandcontextreasoningforvideoscenegraphgeneration
AT kimincheol trackletpairproposalandcontextreasoningforvideoscenegraphgeneration