Séminaire Images Optimisation et Probabilités
Shedding some light on image generative models
David Picard
( ENPC )Salle de conférénces
le 30 janvier 2025 à 11:15
Image generative models have something magical for them: starting from pure noise, they are able to produce stunning images of complex scenes aligned with the text that was given as condition. How can they produce something from what looks like nothing? In this work, I will use two of my recent publications to try to shed some light on this paradox. In the first work¹, we use a continuity argument to investigate how each point of the Gaussian, used for sampling the initial noise, is mapped to images. We show this induces a partition of the Gaussian and that not all partitions are equally good at generating high quality images. Optimal partitions can help designing faster training algorithms since a good partition does not have to be discovered during training. In the second work², we focus on diffusion models and more specifically on Classifier Free Guidance (CFG). Diffusion models map from the noise to the images by training a neural network that predicts the vector field of the corresponding ODE (or SDE). CFG is an acceleration techniques that biases the predicted vector field such that the ODE can be solved in fewer steps. We investigate the influence of CFG at different timesteps of the ODE and we show that the effect on the image depends dramatically on the timestep. We propose very simple (parameter free) changes to mitigate negative effects, but we also show that more involved (parametrized) methods can lead to severe overfitting.
¹Unveiling the Latent Space Geometry of Push-Forward Generative Models. T Issenhuth, U Tanielian, J Mary, D Picard. ICML 2023
²Analysis of Classifier-Free Guidance Weight Schedulers. X Wang, N Dufour, N Andreou, MP Cani, VF Abrevaya, D Picard, V. Kalogeiton. TMLR 2024