Cargando…
LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distribut...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695292/ https://www.ncbi.nlm.nih.gov/pubmed/36433358 http://dx.doi.org/10.3390/s22228761 |
_version_ | 1784838019917807616 |
---|---|
author | Cho, Whanhee Choi, Yongsuk |
author_facet | Cho, Whanhee Choi, Yongsuk |
author_sort | Cho, Whanhee |
collection | PubMed |
description | Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distributions. Since the real data distribution is frequently changing, the generator could not create adequate fake data. To overcome this problem, we present a novel approach for semi-supervised learning for text classification based on generative adversarial networks, Linguistically Informed SeMi-Supervised GAN with Multiple Generators, LMGAN. LMGAN uses trained bidirectional encoder representations from transformers (BERT) and the discriminator from GAN-BERT. In addition, LMGAN has multiple generators and utilizes the hidden layers of BERT. To reduce the discrepancy between the distribution of fake data and real data distribution, LMGAN uses fine-tuned BERT and the discriminator from GAN-BERT. However, since injecting fine-tuned BERT could induce incorrect fake data distribution, we utilize linguistically meaningful intermediate hidden layer outputs of BERT to enrich fake data distribution. Our model shows well-distributed fake data compared to the earlier GAN-based approach that failed to generate adequate high-quality fake data. Moreover, we can get better performances with extremely limited amounts of labeled data, up to 20.0%, compared to the baseline GAN-based model. |
format | Online Article Text |
id | pubmed-9695292 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96952922022-11-26 LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators Cho, Whanhee Choi, Yongsuk Sensors (Basel) Article Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distributions. Since the real data distribution is frequently changing, the generator could not create adequate fake data. To overcome this problem, we present a novel approach for semi-supervised learning for text classification based on generative adversarial networks, Linguistically Informed SeMi-Supervised GAN with Multiple Generators, LMGAN. LMGAN uses trained bidirectional encoder representations from transformers (BERT) and the discriminator from GAN-BERT. In addition, LMGAN has multiple generators and utilizes the hidden layers of BERT. To reduce the discrepancy between the distribution of fake data and real data distribution, LMGAN uses fine-tuned BERT and the discriminator from GAN-BERT. However, since injecting fine-tuned BERT could induce incorrect fake data distribution, we utilize linguistically meaningful intermediate hidden layer outputs of BERT to enrich fake data distribution. Our model shows well-distributed fake data compared to the earlier GAN-based approach that failed to generate adequate high-quality fake data. Moreover, we can get better performances with extremely limited amounts of labeled data, up to 20.0%, compared to the baseline GAN-based model. MDPI 2022-11-13 /pmc/articles/PMC9695292/ /pubmed/36433358 http://dx.doi.org/10.3390/s22228761 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Cho, Whanhee Choi, Yongsuk LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators |
title | LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators |
title_full | LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators |
title_fullStr | LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators |
title_full_unstemmed | LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators |
title_short | LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators |
title_sort | lmgan: linguistically informed semi-supervised gan with multiple generators |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695292/ https://www.ncbi.nlm.nih.gov/pubmed/36433358 http://dx.doi.org/10.3390/s22228761 |
work_keys_str_mv | AT chowhanhee lmganlinguisticallyinformedsemisupervisedganwithmultiplegenerators AT choiyongsuk lmganlinguisticallyinformedsemisupervisedganwithmultiplegenerators |