Responsable : Luis Fredes et Camille Male
Stochastic optimization naturally appear in many application areas, including machine learning. Our goal is to go further in the analysis of the Stochastic Average Gradient Accelerated (SAGA) algorithm. To achieve this, we introduce a new $\lambda$-SAGA algorithm which interpolates between the Stochastic Gradient Descent ($\lambda=0$) and the SAGA algorithm ($\lambda=1$). Firstly, we investigate the almost sure convergence of this new algorithm with decreasing step which allows us to avoid the restrictive strong convexity and Lipschitz gradient hypotheses associated to the objective function. Secondly, we establish a central limit theorem for the $\lambda$-SAGA algorithm. Finally, we provide the non-asymptotic $L^p$ rates of convergence.
Dans cet exposé, je vais m'intéresser au mouvement Brownien dans des cadres simples de géométrie sous riemannienne: le groupe de Heisenberg et les groupes de Carnot de rang 2. Nous proposons une construction d'un couplage de deux mouvement Browniens à un temps fixe. Cette construction est basée sur une décomposition de Legendre du mouvement Brownien standard et de son aire de Lévy. Nous déduisons alors des estimées précises de la décroissance en variation totale entre les lois des mouvements Browniens
et par une technique de changement de probabilité une formule d'intégration par partie de type Bismut ainsi des estimées de régularisation de type Poincaré inverse pour le semi-groupe associé. Travail en commun avec Marc Arnaudon, Magalie Bénéfice et Delphine Féral
Many problems, especially in machine learning, can be formulated as optimization problems. Using optimization algorithms, such as stochastic gradient descent or ADAM, has become a cornerstone to solve these optimization problems. However for many practical cases, theoretical proofs of their efficiency are lacking. In particular, it has been empirically observed that adding a momentum mechanism to the stochastic gradient descent often allows solving these optimization problems more efficiently. In this talk, we introduce a condition linked to a measure of the gradient correlation that allows to theoretically characterize the possibility to observe this acceleration.
À préciser
Abstact: We examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. Joint work w/ W. Azizian, J. Malick, P. Mertikopoulos
https://arxiv.org/abs/2406.09241 published at ICML 2024
À préciser
À préciser
A préciser
A définir
À préciser
A définir
A définir
A définir