This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Direct Preference Optimization Using Sparse Feature-Level Constraints
Created by
Haebom
Author
Qingyu Yin, Chak Tou Leong, Hongbo Zhang, Minjun Zhu, Hanqi Yan, Qiang Zhang, Yulan He, Wenjie Li, Jun Wang, Yue Zhang, Linyi Yang
Outline
In this paper, we propose feature-level constrained preference optimization (FPO), an efficient way to align large-scale language models (LLMs) with human preferences. Unlike traditional RLHF or DPO, FPO improves computational efficiency and training stability by leveraging pre-trained sparse autoencoders (SAEs) and feature-level constraints. It achieves both efficiency and performance through sequential KL divergence with sparsely activated features and offline references. Experimental results on benchmark datasets show that FPO improves the winning rate by 5.08% at a much lower computational cost than existing state-of-the-art techniques.
Takeaways, Limitations
•
Takeaways:
◦
We present a novel method that can significantly improve the computational efficiency and stability of the LLM alignment process.
◦
Enables efficient sorting by leveraging sparse features.
◦
Achieve high performance at lower computational cost than existing methods.
◦
FPO is presented as a promising solution for efficient and controllable LLM alignment.
•
Limitations:
◦
Further validation of the generalization performance of the proposed method is needed.
◦
The performance of FPO may be affected by the performance of the sparse autoencoder used.
◦
Further research is needed on the optimal setting of feature-level constraints.
◦
Further experimental validation on various LLM architectures and datasets is needed.