Cargando…

GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation

In this paper, we propose a new model for conditional video generation (GammaGAN). Generally, it is challenging to generate a plausible video from a single image with a class label as a condition. Traditional methods based on conditional generative adversarial networks (cGANs) often encounter diffic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kang, Minjae, Heo, Yong Seok
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575314/ https://www.ncbi.nlm.nih.gov/pubmed/37836933 http://dx.doi.org/10.3390/s23198103

_version_	1785120895470141440
author	Kang, Minjae Heo, Yong Seok
author_facet	Kang, Minjae Heo, Yong Seok
author_sort	Kang, Minjae
collection	PubMed
description	In this paper, we propose a new model for conditional video generation (GammaGAN). Generally, it is challenging to generate a plausible video from a single image with a class label as a condition. Traditional methods based on conditional generative adversarial networks (cGANs) often encounter difficulties in effectively utilizing a class label, typically by concatenating a class label to the input or hidden layer. In contrast, the proposed GammaGAN adopts the projection method to effectively utilize a class label and proposes scaling class embeddings and normalizing outputs. Concretely, our proposed architecture consists of two streams: a class embedding stream and a data stream. In the class embedding stream, class embeddings are scaled to effectively emphasize class-specific differences. Meanwhile, the outputs in the data stream are normalized. Our normalization technique balances the outputs of both streams, ensuring a balance between the importance of feature vectors and class embeddings during training. This results in enhanced video quality. We evaluated the proposed method using the MUG facial expression dataset, which consists of six facial expressions. Compared with the prior conditional video generation model, ImaGINator, our model yielded relative improvements of 1.61%, 1.66%, and 0.36% in terms of PSNR, SSIM, and LPIPS, respectively. These results suggest potential for further advancements in conditional video generation.
format	Online Article Text
id	pubmed-10575314
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-105753142023-10-14 GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation Kang, Minjae Heo, Yong Seok Sensors (Basel) Article In this paper, we propose a new model for conditional video generation (GammaGAN). Generally, it is challenging to generate a plausible video from a single image with a class label as a condition. Traditional methods based on conditional generative adversarial networks (cGANs) often encounter difficulties in effectively utilizing a class label, typically by concatenating a class label to the input or hidden layer. In contrast, the proposed GammaGAN adopts the projection method to effectively utilize a class label and proposes scaling class embeddings and normalizing outputs. Concretely, our proposed architecture consists of two streams: a class embedding stream and a data stream. In the class embedding stream, class embeddings are scaled to effectively emphasize class-specific differences. Meanwhile, the outputs in the data stream are normalized. Our normalization technique balances the outputs of both streams, ensuring a balance between the importance of feature vectors and class embeddings during training. This results in enhanced video quality. We evaluated the proposed method using the MUG facial expression dataset, which consists of six facial expressions. Compared with the prior conditional video generation model, ImaGINator, our model yielded relative improvements of 1.61%, 1.66%, and 0.36% in terms of PSNR, SSIM, and LPIPS, respectively. These results suggest potential for further advancements in conditional video generation. MDPI 2023-09-27 /pmc/articles/PMC10575314/ /pubmed/37836933 http://dx.doi.org/10.3390/s23198103 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kang, Minjae Heo, Yong Seok GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation
title	GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation
title_full	GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation
title_fullStr	GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation
title_full_unstemmed	GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation
title_short	GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation
title_sort	gammagan: gamma-scaled class embeddings for conditional video generation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575314/ https://www.ncbi.nlm.nih.gov/pubmed/37836933 http://dx.doi.org/10.3390/s23198103
work_keys_str_mv	AT kangminjae gammagangammascaledclassembeddingsforconditionalvideogeneration AT heoyongseok gammagangammascaledclassembeddingsforconditionalvideogeneration

GammaGAN: Gamma-Scaled Class Embeddings for Conditional Video Generation

Ejemplares similares