Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

Created by
  • Haebom

Author

Zixuan Xie, Xinyu Liu, Rohan Chandra, Shangtong Zhang

Outline

This paper studies the convergence speed of Linear TD($\lambda$), one of the most fundamental reinforcement learning algorithms for policy evaluation. Previous studies have assumed linear independence of features, but this assumption often does not hold in real-world settings. This paper presents the first $L^2$ convergence speed of Linear TD($\lambda$) using arbitrary features without any algorithm modifications or additional assumptions. This result applies to both discounted reward and average reward settings. Furthermore, to address the non-uniqueness of solutions that can arise due to arbitrary features, we develop a novel probabilistic approximation that characterizes the convergence speed to a set of solutions rather than a single point.

Takeaways, Limitations

Takeaways:
We first proved the convergence speed of the linear TD($\lambda$) algorithm using arbitrary features.
We present results that are applicable to both discount reward and average reward environments.
We solve the problem of non-unity of solutions by developing a new probabilistic approximation result that guarantees convergence to the solution set.
Limitations:
The paper did not propose any specific algorithmic improvements or new methods.
Quantitative analysis of the specific performance improvements of the presented results was not included (this is abstract, with no specific figures or data).
👍