[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Multiple-Frequencies Population-Based Training

Created by
  • Haebom

Author

Wa el Doulazmi, Auguste Lehuger, Marin Toromanoff, Valentin Charraut, Thibault Buhet, Fabien Moutarde

Outline

Reinforcement learning is very sensitive to hyperparameters, which leads to instability and inefficiency. To solve this problem, hyperparameter optimization (HPO) algorithms have been developed. Population-Based Training (PBT) is an algorithm that has attracted attention for its ability to generate hyperparameter schedules instead of fixed settings. PBT trains multiple agents with different hyperparameters and repeats the process of replacing low-performing agents with variants of superior agents. However, due to this intermediate selection process, PBT focuses on short-term improvements and falls into local optima, which may result in lower performance than general random search in the long term. This paper studies how this greedy problem is related to the evolution frequency (the speed at which selection is made), and proposes MF-PBT (Multiple-Frequencies Population-Based Training), a new HPO algorithm that solves the greedy problem by using subpopulations that evolve at different frequencies. MF-PBT introduces a migration process that transfers information between subpopulations to balance short-term and long-term optimization. Extensive experiments on the Brax suite show that MF-PBT improves sample efficiency and long-term performance without tuning hyperparameters.

Takeaways, Limitations

Takeaways:
We present a novel algorithm, MF-PBT, which addresses the problem of PBT's obsession with short-term improvements.
Improve the balance between short-term and long-term optimization through subpopulations and migration processes.
We demonstrate improved sample efficiency and long-term performance in the Brax suite without hyperparameter tuning.
Limitations:
The performance improvement of MF-PBT is limited to the Brax suite, and further research is needed on its generalizability to other environments.
Further research is needed on the hyperparameters of MF-PBT, such as the optimal number of subpopulations and migration strategy.
A more in-depth comparative analysis with other HPO algorithms is needed.
👍