Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Created by
  • Haebom

Author

Yilie Huang, Yanwei Jia, Xun Yu Zhou

Outline

This paper studies reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems where the state is scalar and there is no execution control reward, but the variability of the state process depends on both the state and control variables. We devise an RL algorithm that directly learns the optimal policy parameters by applying a model-free approach that does not rely on model parameter knowledge or estimation. The main contributions are the introduction of an exploration schedule and the regret analysis of the proposed algorithm. We provide a convergence speed to the optimal values of the policy parameters, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor. We verify the theoretical results through simulation studies and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform a numerical comparison with recent model-based probabilistic LQ RL studies applied in the state- and control-dependent variability setting, showing that the former outperforms the latter in terms of the regret bound.

Takeaways, Limitations

Takeaways:
We present a model-free reinforcement learning algorithm for continuous-time linear-quadratic (LQ) control problems with state- and control-dependent variability.
We analytically prove that the regret bound of the proposed algorithm is $O(N^{\frac{3}{4}})$.
The effectiveness and reliability of the algorithm are verified through simulation results and its superiority is demonstrated by comparison with existing methods.
Limitations:
Applicable only if the state is a scalar value.
Applicable only when there is no execution control compensation.
Further research is needed on extensions to higher-dimensional state spaces.
Additional application and validation to real systems are required.
👍