Cargando…

Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization

Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adv...

Descripción completa

Detalles Bibliográficos
Autores principales: Tominaga, Rihito, Seo, Masataka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823464/
https://www.ncbi.nlm.nih.gov/pubmed/36616847
http://dx.doi.org/10.3390/s23010249
_version_ 1784866166172286976
author Tominaga, Rihito
Seo, Masataka
author_facet Tominaga, Rihito
Seo, Masataka
author_sort Tominaga, Rihito
collection PubMed
description Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR.
format Online
Article
Text
id pubmed-9823464
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98234642023-01-08 Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization Tominaga, Rihito Seo, Masataka Sensors (Basel) Communication Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR. MDPI 2022-12-26 /pmc/articles/PMC9823464/ /pubmed/36616847 http://dx.doi.org/10.3390/s23010249 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Communication
Tominaga, Rihito
Seo, Masataka
Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_full Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_fullStr Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_full_unstemmed Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_short Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
title_sort image generation from text using stackgan with improved conditional consistency regularization
topic Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823464/
https://www.ncbi.nlm.nih.gov/pubmed/36616847
http://dx.doi.org/10.3390/s23010249
work_keys_str_mv AT tominagarihito imagegenerationfromtextusingstackganwithimprovedconditionalconsistencyregularization
AT seomasataka imagegenerationfromtextusingstackganwithimprovedconditionalconsistencyregularization