Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Created by
  • Haebom

Author

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

Outline

This paper studies the feasibility of applying the policy gradient method, which is widely used in single-agent reinforcement learning, to two-player zero-sum incomplete information scalable games (EFGs). While existing scalable game methods rely on approximating semi-realistic values, the policy gradient method is incompatible with them. In this paper, we present for the first time the results that guarantee optimal iterative convergence from self-play to a regulated Nash equilibrium using the policy gradient method. This suggests that the policy gradient method has theoretically guaranteed convergence in scalable games, and can efficiently utilize probabilistic trajectory feedback and avoid importance sampling corrections.

Takeaways, Limitations

Takeaways:
We first demonstrate theoretical convergence of the policy gradient method in a two-person zero-sum incomplete information extended game.
Ensuring optimal iterative convergence to a regulated Nash equilibrium via self-play.
We demonstrate that the policy gradient method is efficient and theoretically robust even in extended games.
Limitations:
Currently limited to two-person zero-sum games. Need to generalize to multi-agent games.
It ensures convergence to a regularized Nash equilibrium, but further research is needed on the strength and impact of the regularization.
Further experimental verification of performance and efficiency in real game environments is needed.
👍