Cargando…
Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization
Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adv...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823464/ https://www.ncbi.nlm.nih.gov/pubmed/36616847 http://dx.doi.org/10.3390/s23010249 |
_version_ | 1784866166172286976 |
---|---|
author | Tominaga, Rihito Seo, Masataka |
author_facet | Tominaga, Rihito Seo, Masataka |
author_sort | Tominaga, Rihito |
collection | PubMed |
description | Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR. |
format | Online Article Text |
id | pubmed-9823464 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-98234642023-01-08 Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization Tominaga, Rihito Seo, Masataka Sensors (Basel) Communication Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR. MDPI 2022-12-26 /pmc/articles/PMC9823464/ /pubmed/36616847 http://dx.doi.org/10.3390/s23010249 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Communication Tominaga, Rihito Seo, Masataka Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization |
title | Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization |
title_full | Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization |
title_fullStr | Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization |
title_full_unstemmed | Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization |
title_short | Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization |
title_sort | image generation from text using stackgan with improved conditional consistency regularization |
topic | Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823464/ https://www.ncbi.nlm.nih.gov/pubmed/36616847 http://dx.doi.org/10.3390/s23010249 |
work_keys_str_mv | AT tominagarihito imagegenerationfromtextusingstackganwithimprovedconditionalconsistencyregularization AT seomasataka imagegenerationfromtextusingstackganwithimprovedconditionalconsistencyregularization |