Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Markov Decision Processes under External Temporal Processes

Created by
  • Haebom

Author

Ranga Shaarad Ayyagari, Revanth Raj Eega, Ambedkar Dukkipati

Outline

This paper studies Markov decision processes (MDPs) influenced by external temporal processes to overcome the limitations of existing reinforcement learning algorithms, which primarily assume a static environment. We demonstrate that, when changes caused by external processes satisfy certain conditions, the problem can be solved by considering only a finite history of past events. To achieve this, we propose a policy iteration algorithm that considers both the current state of the environment and a finite history of past external process events, and conduct a theoretical analysis. While the algorithm does not guarantee convergence, it guarantees policy improvement in specific regions of the state space, depending on the errors caused by the approximate policy and value functions. Furthermore, we present the sample complexity of a least-squares policy evaluation and policy improvement algorithm that considers the approximation due to the integration of finite past temporal events. It is applicable to general discrete-time processes satisfying certain conditions, and we provide additional analysis of a discrete-time Hawkes process with Gaussian marks. We also present experimental results for policy evaluation and deployment in a traditional control environment.

Takeaways, Limitations

Takeaways:
A new approach to solving reinforcement learning problems in non-stationary environments beyond static environments is presented.
Present conditions that allow the problem to be solved using only a finite history of past events.
Proposal and theoretical analysis of a novel policy iteration algorithm applicable to non-stationary environments.
Sample complexity analysis of least squares policy evaluation and policy improvement algorithms.
Applicability to various temporal processes including the Hawkes process is presented.
Limitations:
Convergence of the proposed policy iteration algorithm is not guaranteed.
The state space in which policy improvement is guaranteed depends on the approximation error.
The experimental results are limited to a traditional control environment. Further research is needed to determine generalizability to complex, non-stationary real-world environments.
👍