Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Data-Efficient Safe Policy Improvement Using Parametric Structure

Created by
  • Haebom

Author

Kasper Engelen, Guillermo A. Pérez , Marnix Suilen

Outline

This paper studies the Safety Policy Improvement (SPI) problem, an offline reinforcement learning problem that computes a new policy that reliably outperforms existing behavioral policies using only a dataset and behavioral policies. To improve the data efficiency of existing MDP-based SPI, we propose three techniques that exploit parameter dependencies between distributions within transition dynamics. First, we present a parametric SPI algorithm that utilizes known correlations to more accurately estimate transition dynamics. Second, we present a preprocessing technique that removes redundant actions using game-based abstraction. Third, we present an advanced preprocessing technique based on the Satisfiability Modulo Theory (SMT) solution that identifies more redundant actions. Experimental results demonstrate that the proposed techniques improve the data efficiency of SPI by orders of magnitude while maintaining reliability.

Takeaways, Limitations

Takeaways:
We show that exploiting parameter dependencies between distributions within the transition dynamics can dramatically improve the data efficiency of SPI.
We propose that game-based abstraction and SMT-based preprocessing techniques can effectively remove unnecessary behaviors, thereby improving data efficiency.
The proposed techniques improve data efficiency while maintaining reliability, making them highly applicable to real-world applications.
Limitations:
The effectiveness of the presented techniques may vary depending on specific environments and problem settings.
SMT-based preprocessing techniques can be computationally expensive.
Additional experiments and analyses are needed for various types of environments and problem settings.
👍