Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization

Created by
  • Haebom

Author

Kezhao Liu, Jason Klein Liu, Mingtao Chen, Yiming Liu

Outline

By analyzing implementations of the KL divergence loss in RLHF, we propose a unified framework that bridges the two implementation styles of "k_n as reward" and "k_n as loss." This framework illuminates the principle of Reverse KL (RKL) regularization and proves that "k_2 as loss" is gradient-equivalent to "k_1 in reward" under on-policy conditions. Furthermore, we show that "k_3 as loss" is a biased approximation and propose a method to correct the bias that can arise in off-policy implementations.

Takeaways, Limitations

Takeaways:
By providing a comprehensive understanding of how KL divergence loss is implemented, we contribute to improving the stability and efficiency of RLHF systems.
We present a correct implementation of the RKL objective by proving the equivalence of 'k_2 as loss' and 'k_1 in reward'.
We point out the limitations of 'k_3 as loss' and suggest a method to solve the bias problem in off-policy implementation.
Limitations:
The paper may not contain specific details on the application and performance verification of the methodology presented in an actual RLHF system.
There may be a lack of analysis on the impact of the proposed framework on other KL divergence loss related studies.
Since this analysis is limited to on-policy conditions, additional research on off-policy environments may be required.
👍