Cargando…

Scalable Unsupervised Learning for Deep Discrete Generative Models

Efficient, scalable training of probabilistic generative models is a highly sought after goal in the field of machine learning. One core challenge is that maximum likelihood optimization of generative parameters is computationally intractable for all but a few mostly elementary models. Variational a...

Descripción completa

Detalles Bibliográficos
Autor principal: Guiraud, Enrico
Lenguaje:eng
Publicado: University of Oldenburg 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2775417
Descripción
Sumario:Efficient, scalable training of probabilistic generative models is a highly sought after goal in the field of machine learning. One core challenge is that maximum likelihood optimization of generative parameters is computationally intractable for all but a few mostly elementary models. Variational approximations of the Expectation-Maximization (EM) algorithm offer a generic, powerful framework to derive training algorithms as a function of the chosen form of variational distributions. Also, usage of discrete latent variables in such generative models is considered important to capture the generative process of real-world data, which, for instance, has motivated research on Variational Autoencoders (VAEs) with discrete latents. Here we make use of truncated posteriors as variational distributions and show how the resulting variational approximation of the EM algorithm can be used to establish a close link between evolutionary algorithms (EAs) and training of probabilistic generative models with binary latent variables. We obtain training algorithms that effectively improve the tractable likelihood lower bound of truncated posteriors. After verification of the applicability and scalability of this novel EA-based training on shallow models, we demonstrate how the technique can be mixed with standard optimization of a deep generative model's parameters using auto-differentiation tools and backpropagation, in order to train discrete-latent VAEs. Our approach significantly diverts from standard VAE training and sidesteps some of its standard features such as sampling approximation, reparameterization trick and amortization. For quantitative comparison with other approaches, we used a common image denoising benchmark. In contrast to supervised neural networks, VAEs can denoise a single image without previous training on clean data or on large image datasets. While using a relatively elementary network architecture, we find our model to be competitive with the state of the art in this ``zero-shot'' setting. A review of the open-source software framework developed for training of discrete-latent generative models with truncated posterior approximations is also provided. Our results suggest that EA-based training of discrete-latent VAEs can represent a well-performing, flexible, scalable and arguably more direct training scheme than alternatives proposed previously, opening the door to a large number of possible future research directions.