Cargando…
Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8124611/ https://www.ncbi.nlm.nih.gov/pubmed/34063299 http://dx.doi.org/10.3390/s21093164 |
_version_ | 1783693258097950720 |
---|---|
author | Jung, Gayoung Lee, Jonghun Kim, Incheol |
author_facet | Jung, Gayoung Lee, Jonghun Kim, Incheol |
author_sort | Jung, Gayoung |
collection | PubMed |
description | Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD. |
format | Online Article Text |
id | pubmed-8124611 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-81246112021-05-17 Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation Jung, Gayoung Lee, Jonghun Kim, Incheol Sensors (Basel) Article Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD. MDPI 2021-05-02 /pmc/articles/PMC8124611/ /pubmed/34063299 http://dx.doi.org/10.3390/s21093164 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Jung, Gayoung Lee, Jonghun Kim, Incheol Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation |
title | Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation |
title_full | Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation |
title_fullStr | Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation |
title_full_unstemmed | Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation |
title_short | Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation |
title_sort | tracklet pair proposal and context reasoning for video scene graph generation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8124611/ https://www.ncbi.nlm.nih.gov/pubmed/34063299 http://dx.doi.org/10.3390/s21093164 |
work_keys_str_mv | AT junggayoung trackletpairproposalandcontextreasoningforvideoscenegraphgeneration AT leejonghun trackletpairproposalandcontextreasoningforvideoscenegraphgeneration AT kimincheol trackletpairproposalandcontextreasoningforvideoscenegraphgeneration |