Cargando…

LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators

Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distribut...

Descripción completa

Detalles Bibliográficos
Autores principales: Cho, Whanhee, Choi, Yongsuk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695292/
https://www.ncbi.nlm.nih.gov/pubmed/36433358
http://dx.doi.org/10.3390/s22228761
_version_ 1784838019917807616
author Cho, Whanhee
Choi, Yongsuk
author_facet Cho, Whanhee
Choi, Yongsuk
author_sort Cho, Whanhee
collection PubMed
description Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distributions. Since the real data distribution is frequently changing, the generator could not create adequate fake data. To overcome this problem, we present a novel approach for semi-supervised learning for text classification based on generative adversarial networks, Linguistically Informed SeMi-Supervised GAN with Multiple Generators, LMGAN. LMGAN uses trained bidirectional encoder representations from transformers (BERT) and the discriminator from GAN-BERT. In addition, LMGAN has multiple generators and utilizes the hidden layers of BERT. To reduce the discrepancy between the distribution of fake data and real data distribution, LMGAN uses fine-tuned BERT and the discriminator from GAN-BERT. However, since injecting fine-tuned BERT could induce incorrect fake data distribution, we utilize linguistically meaningful intermediate hidden layer outputs of BERT to enrich fake data distribution. Our model shows well-distributed fake data compared to the earlier GAN-based approach that failed to generate adequate high-quality fake data. Moreover, we can get better performances with extremely limited amounts of labeled data, up to 20.0%, compared to the baseline GAN-based model.
format Online
Article
Text
id pubmed-9695292
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96952922022-11-26 LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators Cho, Whanhee Choi, Yongsuk Sensors (Basel) Article Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distributions. Since the real data distribution is frequently changing, the generator could not create adequate fake data. To overcome this problem, we present a novel approach for semi-supervised learning for text classification based on generative adversarial networks, Linguistically Informed SeMi-Supervised GAN with Multiple Generators, LMGAN. LMGAN uses trained bidirectional encoder representations from transformers (BERT) and the discriminator from GAN-BERT. In addition, LMGAN has multiple generators and utilizes the hidden layers of BERT. To reduce the discrepancy between the distribution of fake data and real data distribution, LMGAN uses fine-tuned BERT and the discriminator from GAN-BERT. However, since injecting fine-tuned BERT could induce incorrect fake data distribution, we utilize linguistically meaningful intermediate hidden layer outputs of BERT to enrich fake data distribution. Our model shows well-distributed fake data compared to the earlier GAN-based approach that failed to generate adequate high-quality fake data. Moreover, we can get better performances with extremely limited amounts of labeled data, up to 20.0%, compared to the baseline GAN-based model. MDPI 2022-11-13 /pmc/articles/PMC9695292/ /pubmed/36433358 http://dx.doi.org/10.3390/s22228761 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cho, Whanhee
Choi, Yongsuk
LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_full LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_fullStr LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_full_unstemmed LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_short LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_sort lmgan: linguistically informed semi-supervised gan with multiple generators
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695292/
https://www.ncbi.nlm.nih.gov/pubmed/36433358
http://dx.doi.org/10.3390/s22228761
work_keys_str_mv AT chowhanhee lmganlinguisticallyinformedsemisupervisedganwithmultiplegenerators
AT choiyongsuk lmganlinguisticallyinformedsemisupervisedganwithmultiplegenerators