Cargando…

Generative adversarial network based synthetic data training model for lightweight convolutional neural networks

Inadequate training data is a significant challenge for deep learning techniques, particularly in applications where data is difficult to get, and publicly available datasets are uncommon owing to ethical and privacy concerns. Various approaches, such as data augmentation and transfer learning, are...

Descripción completa

Detalles Bibliográficos
Autores principales: Rather, Ishfaq Hussain, Kumar, Sushil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199442/
https://www.ncbi.nlm.nih.gov/pubmed/37362646
http://dx.doi.org/10.1007/s11042-023-15747-6
_version_ 1785044934717341696
author Rather, Ishfaq Hussain
Kumar, Sushil
author_facet Rather, Ishfaq Hussain
Kumar, Sushil
author_sort Rather, Ishfaq Hussain
collection PubMed
description Inadequate training data is a significant challenge for deep learning techniques, particularly in applications where data is difficult to get, and publicly available datasets are uncommon owing to ethical and privacy concerns. Various approaches, such as data augmentation and transfer learning, are employed to address this problem, which help to some extent in removing this limitation. However, after a certain amount of data augmentation, the quality of the generated data stalls, and transfer learning suffers from the issue of negative transfer. This paper proposes a novel generative adversarial network-based synthetic data training (GAN-ST) model to generate synthetic data for training a lightweight convolutional neural network (CNN). An enhanced generator is proposed to quickly saturate and cover the colour space of the training distribution. The GAN-ST model is based on Deep Convolutional Generative Adversarial Network(s) (DCGAN) and Conditional Generative Adversarial Network(s) (CGAN) models, which consist of an enhanced generator. The study evaluates the accuracy of a CNN model on the MNIST and CIFAR 10 datasets using both original and synthetic data. The results revealed an impressive classifier accuracy on the MNIST dataset, achieving an accuracy of 99.38% on GAN-ST-generated synthetic training data, which is only 0.05% lower than the performance on original data-based training. The classifier performance on the CIFAR dataset is also remarkable, achieving an accuracy of 90.23%. The performance of CNN trained using GAN-ST-based synthetic data is notable, with the most considerable improvement of 0.66% and 7.06%, over a single GAN-based synthetic data training for the MNIST and CIFAR datasets, respectively. By training two GANs independently, the GAN-ST model covers different parts of the original data distribution, resulting in a more diverse and realistic training data set for the classifier. This diverse set of synthetic data, when used to train a CNN, shows better generalization to new data, leading to improved classification accuracy.
format Online
Article
Text
id pubmed-10199442
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-101994422023-05-23 Generative adversarial network based synthetic data training model for lightweight convolutional neural networks Rather, Ishfaq Hussain Kumar, Sushil Multimed Tools Appl Article Inadequate training data is a significant challenge for deep learning techniques, particularly in applications where data is difficult to get, and publicly available datasets are uncommon owing to ethical and privacy concerns. Various approaches, such as data augmentation and transfer learning, are employed to address this problem, which help to some extent in removing this limitation. However, after a certain amount of data augmentation, the quality of the generated data stalls, and transfer learning suffers from the issue of negative transfer. This paper proposes a novel generative adversarial network-based synthetic data training (GAN-ST) model to generate synthetic data for training a lightweight convolutional neural network (CNN). An enhanced generator is proposed to quickly saturate and cover the colour space of the training distribution. The GAN-ST model is based on Deep Convolutional Generative Adversarial Network(s) (DCGAN) and Conditional Generative Adversarial Network(s) (CGAN) models, which consist of an enhanced generator. The study evaluates the accuracy of a CNN model on the MNIST and CIFAR 10 datasets using both original and synthetic data. The results revealed an impressive classifier accuracy on the MNIST dataset, achieving an accuracy of 99.38% on GAN-ST-generated synthetic training data, which is only 0.05% lower than the performance on original data-based training. The classifier performance on the CIFAR dataset is also remarkable, achieving an accuracy of 90.23%. The performance of CNN trained using GAN-ST-based synthetic data is notable, with the most considerable improvement of 0.66% and 7.06%, over a single GAN-based synthetic data training for the MNIST and CIFAR datasets, respectively. By training two GANs independently, the GAN-ST model covers different parts of the original data distribution, resulting in a more diverse and realistic training data set for the classifier. This diverse set of synthetic data, when used to train a CNN, shows better generalization to new data, leading to improved classification accuracy. Springer US 2023-05-20 /pmc/articles/PMC10199442/ /pubmed/37362646 http://dx.doi.org/10.1007/s11042-023-15747-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Rather, Ishfaq Hussain
Kumar, Sushil
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
title Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
title_full Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
title_fullStr Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
title_full_unstemmed Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
title_short Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
title_sort generative adversarial network based synthetic data training model for lightweight convolutional neural networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199442/
https://www.ncbi.nlm.nih.gov/pubmed/37362646
http://dx.doi.org/10.1007/s11042-023-15747-6
work_keys_str_mv AT ratherishfaqhussain generativeadversarialnetworkbasedsyntheticdatatrainingmodelforlightweightconvolutionalneuralnetworks
AT kumarsushil generativeadversarialnetworkbasedsyntheticdatatrainingmodelforlightweightconvolutionalneuralnetworks