Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Toward a Metrology for Artificial Intelligence: Hidden-Rule Environments and Reinforcement Learning

Created by
  • Haebom

Author

Christo Mathew, Wentian Wang, Jacob Feldman, Lazaros K. Gallos, Paul B. Kantor, Vladimir Menkov, Hao Wang

Outline

This paper studies reinforcement learning in the Hidden Rule Game (GOHR) environment. GOHR is a complex puzzle in which an agent must infer and execute hidden rules to place game pieces into buckets on a 6x6 board to clear the game. We explore two state representation strategies—feature-centric (FC) and object-centric (OC)—and train the agent using a transformer-based advantaged actor-critic (A2C) algorithm. The agent has only partial observations and must infer the governing rules through experience and learn an optimal policy. We evaluate the model in multiple rule-based and trial-list-based experimental settings, analyzing the transfer effects and the impact of representations on learning efficiency.

Takeaways, Limitations

Takeaways:
We demonstrate the applicability of transformer-based reinforcement learning algorithms in complex puzzle environments such as GOHR.
By analyzing the impact of the choice of state representation strategy (FC vs. OC) on learning efficiency, we provide insights into designing effective state representation strategies.
We present the possibility of agent learning that simultaneously performs rule inference and policy learning in a partially observed environment.
By analyzing transfer learning effects across a variety of experimental setups, we enhance our understanding of the generalization ability of reinforcement learning agents.
Limitations:
Due to the complexity of the GOHR environment, interpretation and analysis of the learning process can be challenging.
There is a lack of comparative analysis of the performance of the A2C algorithm used with other algorithms.
The scope of the experimental setup may be limited, and further research is needed on more diverse environments and rules.
A more in-depth analysis and theoretical basis for the choice of state representation strategy is needed.
👍