Cargando…

Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization

Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tominaga, Rihito, Seo, Masataka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Communication
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823464/ https://www.ncbi.nlm.nih.gov/pubmed/36616847 http://dx.doi.org/10.3390/s23010249

_version_	1784866166172286976
author	Tominaga, Rihito Seo, Masataka
author_facet	Tominaga, Rihito Seo, Masataka
author_sort	Tominaga, Rihito
collection	PubMed
description	Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR.
format	Online Article Text
id	pubmed-9823464
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98234642023-01-08 Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization Tominaga, Rihito Seo, Masataka Sensors (Basel) Communication Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR. MDPI 2022-12-26 /pmc/articles/PMC9823464/ /pubmed/36616847 http://dx.doi.org/10.3390/s23010249 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Communication Tominaga, Rihito Seo, Masataka Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title	Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_full	Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_fullStr	Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_full_unstemmed	Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_short	Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_sort	image generation from text using stackgan with improved conditional consistency regularization
topic	Communication
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823464/ https://www.ncbi.nlm.nih.gov/pubmed/36616847 http://dx.doi.org/10.3390/s23010249
work_keys_str_mv	AT tominagarihito imagegenerationfromtextusingstackganwithimprovedconditionalconsistencyregularization AT seomasataka imagegenerationfromtextusingstackganwithimprovedconditionalconsistencyregularization

Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization

Ejemplares similares