Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Stackelberg Coupling of Online Representation Learning and Reinforcement Learning

Created by
  • Haebom

Author

Fernando Martinez, Tao Li, Yingdong Lu, Juntao Chen

Outline

SCORER proposes a framework for value-based reinforcement learning by treating representation learning and Q-learning as a hierarchical game. The Q-function acts as a leader and updates less frequently, while the encoder acts as a follower, learning a representation that minimizes the Bellman error variance according to the leader's strategy. This enables stable co-evolution through asymmetric updates and reduces bias. The proposed SCORER framework is approximated by a two-timescale algorithm that generates asymmetric learning dynamics between two players. Experiments with DQN and its variants demonstrate that the algorithmic insights yield benefits.

Takeaways, Limitations

Takeaways:
To address the instability between representation learning and value learning, hierarchical game theory is used to induce stable coevolution.
Ensuring learning stability through asymmetric updates between the Q-function and representation learning.
Improve performance through algorithmic insights without increasing model complexity.
Limitations:
We need to model complex interactions between two strategic agents.
Further research is needed on the efficiency of two-timescale algorithms and their accuracy in solving optimization problems.
Further experiments are needed to determine the possibility of extending this to other reinforcement learning algorithms.
👍