Responsable : Luis Fredes et Camille Male
Neural style transfer (NST) is a deep learning technique that produces an unprecedentedly rich style transfer from a style image to a content image. It is particularly impressive when it comes to transferring style from a painting to an image. NST was originally achieved by solving an optimization problem to match the global statistics of the style image while preserving the local geometric features of the content image. The two main drawbacks of this original approach is that it is computationally expensive and that the resolution of the output images is limited by high GPU memory requirements. Many solutions have been proposed to both accelerate NST and produce images with larger size. However, our investigation shows that these accelerated methods all compromise the quality of the produced images in the context of painting style transfer. Indeed, transferring the style of a painting is a complex task involving features at different scales, from the color palette and compositional style to the fine brushstrokes and texture of the canvas. This paper provides a solution to solve the original global optimization for ultra-high resolution (UHR) images, enabling multiscale NST at unprecedented image sizes. This is achieved by spatially localizing the computation of each forward and backward passes through the VGG network. Extensive qualitative and quantitative comparisons, as well as a user study, show that our method produces style transfer of unmatched quality for such high-resolution painting styles. By a careful comparison, we show that state of the art fast methods are still prone to artifacts, thus suggesting that fast painting style transfer remains an open problem.
Joint work with Lara Raad, José Lezama and Jean-Michel Morel.
In this talk, I will present the results of a collaboration with Benjamin McKenna on the injective norm of large random Gaussian tensors and uniform random quantum states and, time allowing, describe some of the context underlying this work. The injective norm is a natural generalization to tensors of the operator norm of a matrix and appears in multiple fields. In quantum information, the injective norm is one important measure of genuine multipartite entanglement of quantum states, known as geometric entanglement. In our recent preprint, we provide high-probability upper bounds in different regimes on the injective norm of real and complex Gaussian random tensors, which corresponds to lower bounds on the geometric entanglement of random quantum states, and to bounds on the ground-state energy of a particular multispecies spherical spin glass model. Our result represents a first step towards solving an important question in quantum information that has been part of folklore.
Optimal Transport is a useful metric to compare probability distributions and to compute a pairing given a ground cost. Its entropic regularization variant (eOT) is crucial to have fast algorithms and reflect fuzzy/noisy matchings. This work focuses on Inverse Optimal Transport (iOT), the problem of inferring the ground cost from samples drawn from a coupling that solves an eOT problem. It is a relevant problem that can be used to infer unobserved/missing links, and to obtain meaningful information about the structure of the ground cost yielding the pairing. On one side, iOT benefits from convexity, but on the other side, being ill-posed, it requires regularization to handle the sampling noise. This work presents an in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features. Specifically, we derive a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition. To provide additional insight into this condition, we work out in detail the Gaussian case. We show that as the entropic penalty varies, the iOT problem interpolates between a graphical Lasso and a classical Lasso, thereby stablishing a connection between iOT and graph estimation, an important problem in ML.
Page de l'événement : https://indico.math.cnrs.fr/event/11353/overview
Mercredi 10/07
14h00 Jurgen Angst (Univ. Rennes)
Title : TLC in total variation for beta-ensembles
Abstract : In this talk, we study the fluctuations of linear statistics associated with beta-ensembles, which are statistical physics models generalizing random matrix spectra. In the context of random matrices precisely (e.g. GOE, GUE), the "law of large numbers" is Wigner's theorem, which states that the empirical measure of eigenvalues converges to the semicircle law, and fluctuations around equilibrium can be shown to be Gaussian. We will describe how this result generalizes to beta-ensembles and how it is possible to quantify the speed of convergence to the normal distribution. We obtain optimal rates of convergence for the total variation distance and the Wasserstein distances. To do this, we introduce a variant of Stein's method for a generator
15h00 Nicolas Juillet (Univ. Haute-Alsace)
Title : Exact interpolation of 1-marginals
Abstract : I shall present a new type of martingales that exactly interpolates any given family of 1-dimensional marginals on R1 (satisfying the suitable necessary assumption). The construction makes use of ideas from the (martingale) optimal transportation theory and relies on different stochastic orders. I shall discuss of related constructions and open questions (joint work with Brückerhoff and Huesmann).
16h00 Kolehe Coulibaly-Pasquier (Inst. Ellie Cartan)
Title : On the separation cut-off phenomenon for Brownian motions on high dimensional rotationally
symmetric compact manifolds.
Abstract : Given a family of compact, rotationally symmetric manifolds indexed by the dimension and a weighted function, we will study the cut-off phenomena for the Brownian motion on this family.
Our proof is based on the construction of an intertwined process, a strong stationary time, an estimation of the moments of the covering time of the dual process, and on the phenomena of concentration of the measure.
We will see a phase transition concerning the existence or not of cut-off phenomena, which depend on the shape of the weighted function.
Jeudi 11/07
11h00 Masha Gordina (Univ. of Connecticut)
Title : Dimension-independent functional inequalities on sub-Riemannian manifolds
Abstract : The talk will review recent results on gradient estimates, log Sobolev inequalities, reverse Poincare and log Sobolev inequalities on a class of sub-Riemannian manifolds. As for many of such setting curvature bounds are not available, we use different techniques including tensorization and taking quotients. Joint work with F. Baudoin, L. Luo and R. Sarkar.
In this thesis we study couplings of subelliptic Brownian motions in several subRiemannian manifolds: the free, step
Taking inspiration from previous works on the Heisenberg group we obtain successful non co-adapted couplings on
Finally, we develop a new coupling model "in one sweep" for any free, step
La motivation principale de cet exposé est de trouver des temps forts de stationnarité pour des processus de Markov (X_t), c'est à dire des temps d'arrêt T tels que X_T soit à l'équilibre, T et X_T soient indépendants. Pour trouver des temps fort de stationnarité T, il est naturel et très facile dans certains cas d'utiliser des processus duaux (D_t), tels que T soit un temps d'atteinte d'un certain état pour le processus dual. On étudiera l'entrelacement entre (X_t) et (D_t). On donnera des exemples pour des chaînes de Markov à espace d'états finis, puis on s'intéressera au mouvement brownien avec des processus duaux à valeur ensemble. L'étonnant théorème "2M-X" de Pitman donne un exemple d'entrelacement du mouvement brownien dans le cercle. On généralisera ce théorème aux variétés riemanniennes compactes, et on construira des temps forts de stationnarité. On étudiera la transition de phase en grande dimension. Finalement, on s'intéressera à des duaux à valeur mesure."
Cet exposé se veut une introduction par l'exemple à une théorie de la causalité développée depuis la fin des années 90 par Judea Pearl. Elle lui a valu une partie de son prix ACM Turing en 2011, l'équivalent en informatique du prix Abel. Nous considérerons un modèle classique dont des hypothèses sont formulées par un graphe de cause. Il contient notamment une cause commune inobservable et une variable éthiquement non contrôlable. Adoptant ici un vocabulaire informatique, nous traiterons en détail une requête sur les traces d'exécution d'un programme inexécutable à l'aide de statistiques sur les traces d'un autre programme lui exécutable. Les éléments rencontrés lors de cette analyse seront alors utilisés dans une présentation de l'architecture globale de la démarche de Pearl. Si le temps le permet, nous discuterons quelques éléments sur les calculs probabilistes dans ce contexte qui s'avèrent souvent reformulable uniquement en terme de théorie des graphes.
Séminaire IOP banalisé
Seminaire joint avec OptimAI.
This talk focuses on models for multivariate count data, with emphasis on species abundance data. Two approaches emerge in this framework: the Poisson log-normal (PLN) and the Tree Dirichlet multinomial (TDM) models. The first uses a latent gaussian vector to model dependencies between species whereas the second models dependencies directly on observed abundances. The TDM model makes the assumption that the total abundance is fixed, and is then often used for microbiome data since the sequencing depth (in RNA seq) varies from one observation to another leading to a total abundance that is not really interpretable. We propose to generalize TDM model in two ways: by relaxing the fixed total abundance and by using Polya distribution instead of Dirichlet multinomial. This family of models corresponds to Polya urn models with a random number of draws and will be named Polya splitting distributions. In a first part I will present the probabilistic properties of such models, with focus on marginals and probabilistic graphical model. Then it will be shown that these models emerge as stationary distributions of multivariate birth death process under simple parametric assumption on birth-death rates. These assumptions are related to the neutral theory of biodiversity that assumes no biological interaction between species. Finally, the statistical aspects of Polya splitting models will be presented: the regression framework, the inference, the consideration of a partition tree structure and two applications on real data.
Deep learning has revolutionised image processing and is often considered to outperform classical approaches based on accurate modelling of the image formation process. In this presentation, we will discuss the interplay between model-based and learning-based paradigms, and show that hybrid approaches show great promises for scientific imaging, where interpretation and robustness to real-world degradation is important. We will present two applications on super-resolution and high-dynamic range imaging, and exoplanet detection from direct imaging at high contrast.
N'oubliez pas de vous inscrire à la liste maths-ia !
https://listes.math.u-bordeaux.fr/wws/subscribe/mathsia?previous_action=info
In statistical learning, many analyses and methods rely on optimization, including its stochastic versions introduced for example, to overcome an intractability of the objective function or to reduce the computational cost of the deterministic optimization step.
In 1951, H. Robbins and S. Monro introduced a novel iterative algorithm, named "Stochastic Approximation", for the computation of the zeros of a function defined by an expectation with no closed-form expression. This algorithm produces a sequence of iterates, by replacing at each iteration the unknown expectation with a Monte Carlo approximation based on one sample. Then, this method was generalized: it is a stochastic algorithm designed to find the zeros of a vector field when only stochastic oracles of this vector field are available.
Stochastic Gradient Descent algorithms are the most popular examples of Stochastic Approximation : oracles come from a Monte Carlo approximation of a large sum. Possibly less popular are examples named "beyond the gradient case" for at least two reasons. First, they rely on oracles that are biased approximation of the vector field, as it occurs when biased Monte Carlo sampling is used for the definition of the oracles. Second, the vector field is not necessarily a gradient vector field. Many examples in Statistics and more
generally in statistical learning are "beyond the gradient case": among examples, let us cite compressed stochastic gradient descent, stochastic Majorize-Minimization methods such as the Expectation-Maximization algorithm, or the Temporal Difference algorithm in reinforcement learning.
In this talk, we will show that these "beyond the gradient case" Stochastic Approximation algorithms still converge, even when the oracles are biased, as soon as some parameters of the algorithm are tuned enough. We will discuss what 'tuned enough' means when the quality criterion relies on epsilon-approximate stationarity. We will also comment the efficiency of the
algorithm through sample complexity. Such analyses are based on non-asymptotic convergence bounds in expectation: we will present a unified method to obtain such bounds for a large class of Stochastic Approximation methods including both the gradient case and the beyond the gradient case. Finally, a Variance Reduction technique will be described and its efficiency illustrated.
Stochastic optimization naturally appear in many application areas, including machine learning. Our goal is to go further in the analysis of the Stochastic Average Gradient Accelerated (SAGA) algorithm. To achieve this, we introduce a new
Dans cet exposé, je vais m'intéresser au mouvement Brownien dans des cadres simples de géométrie sous riemannienne: le groupe de Heisenberg et les groupes de Carnot de rang 2. Nous proposons une construction d'un couplage de deux mouvement Browniens à un temps fixe. Cette construction est basée sur une décomposition de Legendre du mouvement Brownien standard et de son aire de Lévy. Nous déduisons alors des estimées précises de la décroissance en variation totale entre les lois des mouvements Browniens
et par une technique de changement de probabilité une formule d'intégration par partie de type Bismut ainsi des estimées de régularisation de type Poincaré inverse pour le semi-groupe associé. Travail en commun avec Marc Arnaudon, Magalie Bénéfice et Delphine Féral
Many problems, especially in machine learning, can be formulated as optimization problems. Using optimization algorithms, such as stochastic gradient descent or ADAM, has become a cornerstone to solve these optimization problems. However for many practical cases, theoretical proofs of their efficiency are lacking. In particular, it has been empirically observed that adding a momentum mechanism to the stochastic gradient descent often allows solving these optimization problems more efficiently. In this talk, we introduce a condition linked to a measure of the gradient correlation that allows to theoretically characterize the possibility to observe this acceleration.
Statistical estimation in a geometric context has become an increasingly important issue in recent years, not least due to the development in ML of non-linear dimension reduction techniques, which involve projecting high-dimensional data onto a much lower-dimensional sub-manifold. The search for statistical guarantees justifying both the use and the effectiveness of these algorithms is now a much-studied area. In this talk, we will take a geometric view of the issue, and see how some usual curvature quantities are translated into algorithmic guarantees. First, we will see that upper bounds on sectional curvatures give good properties for barycenter estimation, and then we will see that a lower bound on Ricci curvature implies the existence of depth points, giving rise to robust statistical estimators. Those works are based on joint works with Victor-Emmanuel Brunel (ENSAE Paris) and Shin-ichi Ohta (Osaka University).
Abstact: We examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. Joint work w/ W. Azizian, J. Malick, P. Mertikopoulos
https://arxiv.org/abs/2406.09241 published at ICML 2024
In many data-driven domains, a standard task is to understand how one distribution transforms into another on the basis of samples, with the goal of estimating this transformation for unseen out-of-sample points. The optimal transport map is one such canonical transformation between two distributions, which has been widely used in applied statistical problems, machine learning, and economics. Many estimators in the literature are prohibitively expensive and do not scale to settings where the number of samples is large or if the dimension is moderately large. To remedy both of these issues, we propose and analyze the entropic transport map, a computationally efficient estimator of the optimal transport map based on entropic optimal transport (Cuturi, 2013). Due to Sinkhorn's algorithm, we can take advantage of many samples for the purposes of estimation while maintaining a near-optimal convergence rate in the low-smoothness regime. Recently, we leveraged these results to provide statistical estimation rates for the Schrödinger bridge between two distributions. This is joint work with Jonathan Niles-Weed (NYU).
ReLU neural networks parameterizations are well-known to satisfy rescaling symmetries, which arise due to the homogeneity of the ReLU activation function. Ignoring such symmetries can lead to inefficient algorithms, non-informative theoretical bounds, and irrelevant interpretations of parameters. Can such symmetries be harnessed, theoretically and practically ? This is the goal of the path-lifting, a rescaling-invariant polynomial representation of the parameters of ReLU networks and their modern variants with max-pooling and skip connections.
Despite its combinatorial dimension, the path-lifting yields easily computable quantities that reveal useful properties of the corresponding functions, from Lipschitz regularity to convexity or statistical generalization bounds .... Besides introducing the general concept of path-lifting from basic examples and highlighting its key mathematical and computational properties, the talk will quickly tour some of its applications such as network pruning with guarantees.
Primarily based on joint work with A. Gonon, N. Brisebarre, E. Riccietti (https://hal.science/hal-04225201v5, https://hal.science/hal-04584311v3)
and with A. Gagneux, M. Massias, E. Soubies (https://hal.science/hal-04877619v1)
Image generative models have something magical for them: starting from pure noise, they are able to produce stunning images of complex scenes aligned with the text that was given as condition. How can they produce something from what looks like nothing? In this work, I will use two of my recent publications to try to shed some light on this paradox. In the first work¹, we use a continuity argument to investigate how each point of the Gaussian, used for sampling the initial noise, is mapped to images. We show this induces a partition of the Gaussian and that not all partitions are equally good at generating high quality images. Optimal partitions can help designing faster training algorithms since a good partition does not have to be discovered during training. In the second work², we focus on diffusion models and more specifically on Classifier Free Guidance (CFG). Diffusion models map from the noise to the images by training a neural network that predicts the vector field of the corresponding ODE (or SDE). CFG is an acceleration techniques that biases the predicted vector field such that the ODE can be solved in fewer steps. We investigate the influence of CFG at different timesteps of the ODE and we show that the effect on the image depends dramatically on the timestep. We propose very simple (parameter free) changes to mitigate negative effects, but we also show that more involved (parametrized) methods can lead to severe overfitting.
¹Unveiling the Latent Space Geometry of Push-Forward Generative Models. T Issenhuth, U Tanielian, J Mary, D Picard. ICML 2023
²Analysis of Classifier-Free Guidance Weight Schedulers. X Wang, N Dufour, N Andreou, MP Cani, VF Abrevaya, D Picard, V. Kalogeiton. TMLR 2024
Le but de cet exposé sera de comprendre ce que sont les expanseurs quantiques, à quoi ils servent, et comment ils peuvent être construits. On commencera par rappeler la définition des graphes expanseurs classiques, et par expliquer comment définir des analogues quantiques de ces objets. On montrera ensuite que, aussi bien classiquement que quantiquement, des constructions aléatoires fournissent typiquement des exemples d'expanseurs optimaux. Dans le cas quantique, un tel résultat découle d'une analyse spectrale pour des modèles de matrices aléatoires avec une structure tensorielle. Enfin, on verra ce que cela implique en termes de décroissance typique des corrélations dans les systèmes quantiques 1D gouvernés par des interactions locales.
L'exposé se basera principalement sur les travaux suivants: https://arxiv.org/abs/1906.11682 (avec David Pérez-Garcia), https://arxiv.org/abs/2302.07772 (avec Pierre Youssef) et https://arxiv.org/abs/2409.17971.
In this presentation, a response matrix (here, species abundances) is assumed to depend on explanatory variables (here, environmental variables) supposed many and redundant, thus demanding dimension reduction. The Supervised Component-based Generalized Linear Regression (SCGLR), a Partial Least Squares-type method, is designed to extract from the explanatory variables several components jointly supervised by the set of responses. However, this methodology still has some limitations we aim to overcome in this work. The first limitation comes from the assumption that all the responses are predicted by the same explanatory space. As a second limitation, the previous works involving SCGLR assume the responses independent conditional on the explanatory variables. Again, this is not very likely in practice, especially in situations like those in ecology, where a non-negligible part of the explanatory variables could not be measured. To overcome the first limitation, we assume that the responses are partitioned into several unknown groups. We suppose that the responses in each group are predictable from an appropriate number of specific orthogonal supervised components of the explanatory variables. The second work relaxes the conditional independence assumption. A set of few latent factors models the residual covariance matrix of the responses conditional on the components. The approaches presented in this work are tested on simulation schemes, and then applied on ecology datasets.
Séminaire joint avec OptimAI.
Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This “implicit bias” is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. In this work, we expose the definition and properties of “conservation laws”, that define quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then we explain how to find the exact number of independent conservation laws via Lie algebra computations. This procedure recovers the conservation laws already known for linear and ReLU neural networks for Euclidean gradient flows, and prove that there are no other laws. We identify new laws for certain flows with momentum and/or non-Euclidean geometries.
Joint work with Gabriel Peyré and Rémi Gribonval. Associated papers: https://arxiv.org/abs/2307.00144 https://arxiv.org/abs/2405.12888
The behavior of the random feature model in a high-dimensional framework has recently become a popular topic of interest in the machine learning literature. This model is generally considered for feature vectors composed of independent and identically distributed (iid) entries. We move beyond this specific assumption, which may be restrictive in various applications. To this end, we propose studying the performance of the random feature model with non-iid data by introducing a variance profile to the feature matrix. The performance of this model is linked to the spectrum of the random feature matrix, which turns out to be a nonlinear mixture of random variance profiled matrices. We have computed the limiting traffic distribution of such matrices using an extension of the method of moments. Knowledge of this distribution allowed us to introduce a new random matrix, which we call the « linear plus chaos » matrix, and which shares the same limiting spectrum as the random feature matrix. This linear plus chaos model proves to be simpler to study and has enabled us to derive deterministic equivalents that describe the asymptotic behavior of the performance of the random feature model.
We consider spatial stochastic population models, such as those modelling biological systems, which exhibit bistability, including several interacting particle systems and a variant of the Spatial Lambda Fleming-Viot process. In such models, narrow interfaces tend to form between regions dominated by one of the two stable states. To understand how the population evolves, we may study the dynamics of these interfaces in time. For several bistable population models, it is known from recent work that the limiting interfaces, under certain rescalings, converge to a geometric evolution called mean curvature flow. Surfaces evolving by mean curvature flow develop singularities in finite time, which imposes a short-time constraint and regularity assumptions on the previous convergence results.
In this talk, I will first discuss some models which exhibit this phenomenon, and results concerning their interfaces. I will then discuss a recent work which uses tools from analysis, in particular level-set methods and the theory of viscosity solutions, to improve upon recent interface convergence results for a broad class of bistable stochastic population models. In particular, we give checkable conditions on an ancestral dual process which guarantee that the interfaces converge globally in time to a generalized mean curvature flow.
This is joint work with Jessica Lin (McGill).
In this talk, we will present a high-level overview of systematic methods for analyzing and designing optimization algorithms, leveraging symbolic computations and semidefinite programming. In particular, we will introduce the performance estimation approach and demonstrate how it can be used to construct performance certificates and counterexamples, potentially in compact forms (e.g., via Lyapunov functions).
The presentation will be illustrated through concrete examples, as the core principles underlying these methods are already embedded in fundamental optimization schemes.
The methodology is implemented in the PEPit! package (https://pepit.readthedocs.io/), which enables users to apply the framework without requiring direct SDP modeling. This talk is based on joint work with great collaborators, who will be acknowledged during the presentation.
A common approach to generative modeling is to split model-fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure
A définir