Cargando…

Dataset Condensation via Expert Subspace Projection

The rapid growth in dataset sizes in modern deep learning has significantly increased data storage costs. Furthermore, the training and time costs for deep neural networks are generally proportional to the dataset size. Therefore, reducing the dataset size while maintaining model performance is an u...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ma, Zhiheng, Gao, Dezheng, Yang, Shaolei, Wei, Xing, Gong, Yihong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10574980/ https://www.ncbi.nlm.nih.gov/pubmed/37836977 http://dx.doi.org/10.3390/s23198148

_version_	1785120816415899648
author	Ma, Zhiheng Gao, Dezheng Yang, Shaolei Wei, Xing Gong, Yihong
author_facet	Ma, Zhiheng Gao, Dezheng Yang, Shaolei Wei, Xing Gong, Yihong
author_sort	Ma, Zhiheng
collection	PubMed
description	The rapid growth in dataset sizes in modern deep learning has significantly increased data storage costs. Furthermore, the training and time costs for deep neural networks are generally proportional to the dataset size. Therefore, reducing the dataset size while maintaining model performance is an urgent research problem that needs to be addressed. Dataset condensation is a technique that aims to distill the original dataset into a much smaller synthetic dataset while maintaining downstream training performance on any agnostic neural network. Previous work has demonstrated that matching the training trajectory between the synthetic dataset and the original dataset is more effective than matching the instantaneous gradient, as it incorporates long-range information. Despite the effectiveness of trajectory matching, it suffers from complex gradient unrolling across iterations, which leads to significant memory and computation overhead. To address this issue, this paper proposes a novel approach called Expert Subspace Projection (ESP), which leverages long-range information while avoiding gradient unrolling. Instead of strictly enforcing the synthetic dataset’s training trajectory to mimic that of the real dataset, ESP only constrains it to lie within the subspace spanned by the training trajectory of the real dataset. The memory-saving advantage offered by our method facilitates unbiased training on the complete set of synthetic images and seamless integration with other dataset condensation techniques. Through extensive experiments, we have demonstrated the effectiveness of our approach. Our method outperforms the trajectory matching method on CIFAR10 by 16.7% in the setting of 1 Image/Class, surpassing the previous state-of-the-art method by 3.2%.
format	Online Article Text
id	pubmed-10574980
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-105749802023-10-14 Dataset Condensation via Expert Subspace Projection Ma, Zhiheng Gao, Dezheng Yang, Shaolei Wei, Xing Gong, Yihong Sensors (Basel) Article The rapid growth in dataset sizes in modern deep learning has significantly increased data storage costs. Furthermore, the training and time costs for deep neural networks are generally proportional to the dataset size. Therefore, reducing the dataset size while maintaining model performance is an urgent research problem that needs to be addressed. Dataset condensation is a technique that aims to distill the original dataset into a much smaller synthetic dataset while maintaining downstream training performance on any agnostic neural network. Previous work has demonstrated that matching the training trajectory between the synthetic dataset and the original dataset is more effective than matching the instantaneous gradient, as it incorporates long-range information. Despite the effectiveness of trajectory matching, it suffers from complex gradient unrolling across iterations, which leads to significant memory and computation overhead. To address this issue, this paper proposes a novel approach called Expert Subspace Projection (ESP), which leverages long-range information while avoiding gradient unrolling. Instead of strictly enforcing the synthetic dataset’s training trajectory to mimic that of the real dataset, ESP only constrains it to lie within the subspace spanned by the training trajectory of the real dataset. The memory-saving advantage offered by our method facilitates unbiased training on the complete set of synthetic images and seamless integration with other dataset condensation techniques. Through extensive experiments, we have demonstrated the effectiveness of our approach. Our method outperforms the trajectory matching method on CIFAR10 by 16.7% in the setting of 1 Image/Class, surpassing the previous state-of-the-art method by 3.2%. MDPI 2023-09-28 /pmc/articles/PMC10574980/ /pubmed/37836977 http://dx.doi.org/10.3390/s23198148 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Ma, Zhiheng Gao, Dezheng Yang, Shaolei Wei, Xing Gong, Yihong Dataset Condensation via Expert Subspace Projection
title	Dataset Condensation via Expert Subspace Projection
title_full	Dataset Condensation via Expert Subspace Projection
title_fullStr	Dataset Condensation via Expert Subspace Projection
title_full_unstemmed	Dataset Condensation via Expert Subspace Projection
title_short	Dataset Condensation via Expert Subspace Projection
title_sort	dataset condensation via expert subspace projection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10574980/ https://www.ncbi.nlm.nih.gov/pubmed/37836977 http://dx.doi.org/10.3390/s23198148
work_keys_str_mv	AT mazhiheng datasetcondensationviaexpertsubspaceprojection AT gaodezheng datasetcondensationviaexpertsubspaceprojection AT yangshaolei datasetcondensationviaexpertsubspaceprojection AT weixing datasetcondensationviaexpertsubspaceprojection AT gongyihong datasetcondensationviaexpertsubspaceprojection

Dataset Condensation via Expert Subspace Projection

Ejemplares similares