Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

Created by
  • Haebom

Author

Jina Kim, Youjin Jang, Jeongjin Han

Outline

VRAIL (Vectorized Reward-based Attribution for Interpretable Learning) is a bi-level framework for value-based reinforcement learning (RL) that learns interpretable weight representations from state features. VRAIL consists of two phases: a deep learning (DL) phase that fits an estimated value function using state features, and a RL phase that shapes the learning through a potential-based reward transformation. The estimator can be modeled in linear or quadratic form to attribute importance to individual features and their interactions. Experimental results on the Taxi-v3 environment demonstrate that VRAIL improves training stability and convergence compared to standard DQN without any environment modification. Further analysis highlights the ability of VRAIL to discover semantically meaningful subgoals, such as passenger possession, to generate human-interpretable actions. The results suggest that VRAIL serves as a general, model-independent reward-shaping framework that enhances both learning and interpretability.

Takeaways, Limitations

Takeaways:
VRAIL provides improved training stability and convergence compared to standard DQN.
Train interpretable reinforcement learning agents without modifying the environment.
Identify semantically meaningful subgoals to generate human-understandable actions.
It can be used as a general reward formation framework that is model-independent.
Limitations:
Currently, it has only been evaluated in the Taxi-v3 environment, and generalization performance to other environments requires further study.
Only linear and quadratic forms of estimators were considered, while more complex forms of estimators may contribute to improved performance.
Further research is needed to determine the generalizability of subgoal findings to specific environments.
👍