Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Created by
  • Haebom

Author

Lingxiao Kong, Cong Yang, Susanne Neufang, Oya Deniz Beyan, Zeyd Boukhers

Outline

In this paper, we point out the Limitations (tradeoffs between conflicting objectives, low training efficiency, lack of scalability, and lack of explainability) of existing methods in solving multi-objective tasks in reinforcement learning (RL)-based fine-tuning of large-scale language models (LLMs), and propose a novel framework, EMORL (Ensemble Multi-Objective RL). EMORL fine-tunes multiple models with individual objectives, and aggregates the hidden states of these models after fine-tuning to improve efficiency and flexibility. In particular, we present the first hidden state aggregation method that integrates context information of multiple objectives, and a hierarchical grid search algorithm that finds the optimal weight combination. Through experiments on the counselor response generation task, we show that our proposed method significantly reduces training data consumption and time ($17,529\pm 1,650$ data points, $6,573\pm 147.43$ seconds, respectively) compared to existing methods, while maintaining similar performance on multi-objectives.

Takeaways, Limitations

Takeaways:
Contributed to solving training efficiency and scalability problems in multi-objective reinforcement learning.
Improved performance and increased explainability by leveraging multi-target context information through hidden state aggregation.
Efficient search for optimal model weight combinations using a hierarchical grid search algorithm.
Validation of effectiveness in real-world applications such as generating counselor responses.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Applicability to various types of multi-objective tasks needs to be verified.
Need to improve the computational complexity of hierarchical grid search algorithms.
👍