Cargando…

LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators

Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distribut...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cho, Whanhee, Choi, Yongsuk
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695292/ https://www.ncbi.nlm.nih.gov/pubmed/36433358 http://dx.doi.org/10.3390/s22228761

_version_	1784838019917807616
author	Cho, Whanhee Choi, Yongsuk
author_facet	Cho, Whanhee Choi, Yongsuk
author_sort	Cho, Whanhee
collection	PubMed
description	Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distributions. Since the real data distribution is frequently changing, the generator could not create adequate fake data. To overcome this problem, we present a novel approach for semi-supervised learning for text classification based on generative adversarial networks, Linguistically Informed SeMi-Supervised GAN with Multiple Generators, LMGAN. LMGAN uses trained bidirectional encoder representations from transformers (BERT) and the discriminator from GAN-BERT. In addition, LMGAN has multiple generators and utilizes the hidden layers of BERT. To reduce the discrepancy between the distribution of fake data and real data distribution, LMGAN uses fine-tuned BERT and the discriminator from GAN-BERT. However, since injecting fine-tuned BERT could induce incorrect fake data distribution, we utilize linguistically meaningful intermediate hidden layer outputs of BERT to enrich fake data distribution. Our model shows well-distributed fake data compared to the earlier GAN-based approach that failed to generate adequate high-quality fake data. Moreover, we can get better performances with extremely limited amounts of labeled data, up to 20.0%, compared to the baseline GAN-based model.
format	Online Article Text
id	pubmed-9695292
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96952922022-11-26 LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators Cho, Whanhee Choi, Yongsuk Sensors (Basel) Article Semi-supervised learning is one of the active research topics these days. There is a trial that solves semi-supervised text classification with a generative adversarial network (GAN). However, its generator has a limitation in producing fake data distributions that are similar to real data distributions. Since the real data distribution is frequently changing, the generator could not create adequate fake data. To overcome this problem, we present a novel approach for semi-supervised learning for text classification based on generative adversarial networks, Linguistically Informed SeMi-Supervised GAN with Multiple Generators, LMGAN. LMGAN uses trained bidirectional encoder representations from transformers (BERT) and the discriminator from GAN-BERT. In addition, LMGAN has multiple generators and utilizes the hidden layers of BERT. To reduce the discrepancy between the distribution of fake data and real data distribution, LMGAN uses fine-tuned BERT and the discriminator from GAN-BERT. However, since injecting fine-tuned BERT could induce incorrect fake data distribution, we utilize linguistically meaningful intermediate hidden layer outputs of BERT to enrich fake data distribution. Our model shows well-distributed fake data compared to the earlier GAN-based approach that failed to generate adequate high-quality fake data. Moreover, we can get better performances with extremely limited amounts of labeled data, up to 20.0%, compared to the baseline GAN-based model. MDPI 2022-11-13 /pmc/articles/PMC9695292/ /pubmed/36433358 http://dx.doi.org/10.3390/s22228761 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Cho, Whanhee Choi, Yongsuk LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title	LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_full	LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_fullStr	LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_full_unstemmed	LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_short	LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
title_sort	lmgan: linguistically informed semi-supervised gan with multiple generators
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695292/ https://www.ncbi.nlm.nih.gov/pubmed/36433358 http://dx.doi.org/10.3390/s22228761
work_keys_str_mv	AT chowhanhee lmganlinguisticallyinformedsemisupervisedganwithmultiplegenerators AT choiyongsuk lmganlinguisticallyinformedsemisupervisedganwithmultiplegenerators

LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators

Ejemplares similares