Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning

Created by
  • Haebom

Author

Manav Vora, Jonas Liang, Melkior Ornik

Outline

This paper proposes a novel method for solving a monotonic partially observable Markov decision process (POMDP) with multiple components within a limited budget. Monotonic POMDPs are well-suited for modeling systems in which the state gradually decays and persists until a repair action is taken, and are particularly effective for sequential repair problems. Existing methods suffer from computational difficulties due to the exponential growth of the state space as the number of components increases. This paper presents a two-step approach to address this issue. First, we approximate the optimal value function of each component POMDP with a random forest model to efficiently allocate the budget to each component. Next, we use an oracle-guided meta-learning approximate policy optimization (PPO) algorithm to solve each independent, budget-constrained single-component monotonic POMDP. The oracle policy is obtained through value iteration over the corresponding monotonic Markov decision process (MDP). We demonstrate the effectiveness of the proposed method by considering a real-world inspection and repair scenario of an administrative building, and demonstrate its scalability by analyzing the computational complexity as a function of the number of components.

Takeaways, Limitations

Takeaways:
An efficient solution to the monotonic POMDP problem with multiple components under a limited budget is presented.
Combining random forests and oracle-guided meta-learning PPO algorithms to achieve large-scale problem solving potential.
Validation of the method's practicality through a real-world management building maintenance scenario.
Scalability demonstrated through analysis of computational complexity as the number of components increases.
Limitations:
The accuracy of optimal budget allocation may be affected by the accuracy of the random forest model.
The accuracy of the oracle policy can affect the performance of the entire algorithm.
When applied to real-world problems, model parameter tuning may be required.
Further research may be needed to determine the generalizability of this method to various types of monotonic POMDP problems.
👍