[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries

Created by
  • Haebom

Author

Hyunji Nam, Yanming Wan, Mickel Liu, Jianxun Lian, Natasha Jaques

Outline

In this paper, we present PLUS (Preference Learning Using Summarization), a novel framework for personalizing responses of large language models (LLMs) based on user preferences and goals. While traditional RLHF (Reinforcement Learning from Human Feedback) trains all users with a single reward model, which fails to account for user-to-user variability, PLUS learns text-based summaries that summarize each user’s preferences, features, and past conversations. These summaries condition the reward model to enable personalized predictions of response types that each user considers important. We create an online co-adaptation loop that trains user summary models and simultaneously updates the reward model through reinforcement learning. We demonstrate that PLUS is robust to new users and diverse conversation topics on diverse user datasets, and that the generated user summaries can be transferred to zero-shot personalization of powerful proprietary models such as GPT-4. As a result, the generated user summaries are concise and portable, as well as easy for users to interpret and modify, enhancing transparency and user control over LLM alignment.

Takeaways, Limitations

Takeaways:
Presenting an effective framework for personalizing LLM responses to user preferences
Robustness for new users and diverse conversation topics
Verifying transferability of generated user summaries to other models such as GPT-4
Increased transparency and user control through conciseness, portability, and ease of interpretation and modification of user summaries.
Limitations:
Further research is needed on the actual implementation and scalability of the PLUS framework.
Need to verify generalization performance on various user datasets
Further evaluation of the accuracy and reliability of user summaries is needed.
Vulnerability analysis required for malicious user input
👍