[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Created by
  • Haebom

Author

Devdhar Patel, Hava Siegelmann

Outline

In this paper, we present a novel reinforcement learning algorithm called sequence reinforcement learning (SRL). SRL is designed to generate a sequence of actions for a given input state, enabling effective control even at low decision frequencies. We address the difficulty of learning action sequences by using models and action-critic architectures at different time scales. In particular, we propose a “temporal replay” mechanism in which the critic uses the model to estimate intermediate states between basic actions, providing learning signals for each action in the sequence. After learning is completed, the action generator generates action sequences independently of the model, achieving model-free control at low frequencies. To better evaluate the performance at various decision frequencies, we introduce the frequency-average score (FAS) metric, and demonstrate the superiority of SRL over existing algorithms in continuous control tasks.

Takeaways, Limitations

Takeaways:
We present a reinforcement learning algorithm that achieves high performance even at low decision frequencies.
It significantly reduces sample complexity while achieving similar performance compared to model-based online planning algorithms.
We propose a new evaluation metric called Frequency Average Score (FAS) to enable comparison of performance at different decision frequencies.
Increases applicability to real environments.
Limitations:
Further research is needed on the generalization performance of the proposed algorithm.
Further performance evaluation in various environments is needed.
A more in-depth analysis of the efficiency of temporal reproducibility mechanisms is needed.
👍