Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games

Created by
  • Haebom

Author

Anran Hu, Junzi Zhang

Outline

This paper proposes Mean-Field Occupation-Measure Learning (MF-OML), an online mean-field reinforcement learning algorithm for computing approximate Nash equilibria of large-scale collective sequentially symmetric games. MF-OML is the first fully polynomial-time multi-agent reinforcement learning algorithm that provably solves Nash equilibria (with vanishing mean-field approximation errors as the number of players N tends to infinity) beyond zero-sum games and latent game variants. For games with strong Lasry-Lions monotonicity, it achieves a high-probability regret upper bound of $\tilde{O}(M^{3/4}+N^{-1/2}M)$, as measured by the cumulative deviation from the Nash equilibrium, and for games with only Lasry-Lions monotonicity, it achieves a regret upper bound of $\tilde{O}(M^{11/12}+N^{- 1/6}M)$, where M is the total number of episodes and N is the number of agents in the game. As a by-product, we obtain the first tractable globally convergent computational algorithm for computing approximate Nash equilibria of monotonic mean-field games.

Takeaways, Limitations

Takeaways:
We propose a new algorithm, MF-OML, for efficiently computing approximate Nash equilibria for large-scale collective sequential symmetric games.
The first fully polynomial-time complexity algorithm that provably solves Nash equilibria beyond zero-sum games and variants of potential games.
We present a tractable global convergence computation algorithm for computing approximate Nash equilibria of monotonic mean-field games.
Lasry-Lions provides a clear upper bound on regret under monotonicity conditions.
Limitations:
The algorithm's performance depends on the Lasry-Lions monotonicity condition and may not be applicable to all games.
The regret upper bound includes the mean field approximation error and may not perfectly reflect the difference from the actual Nash equilibrium.
The actual performance of the algorithm may vary depending on the characteristics of the game and requires further experimental verification.
👍