Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Created by
  • Haebom

Author

Yanming Wan, Jiaxing Wu, Marwa Abdulhai, Lior Shani, Natasha Jaques

Outline

Effective conversational agents, such as Large-Scale Language Models (LLMs), must personalize interactions to adapt to user preferences, personality traits, and attributes across diverse domains, including education and healthcare. Current methods prioritize usability and safety, but fall short in facilitating truly empathetic, adaptive, and personalized conversations. In this paper, we propose incorporating curiosity-based intrinsic rewards into multi-turn RLHF, leveraging user models. This novel reward mechanism encourages the LLM agent to actively infer user characteristics and optimize conversations to improve the accuracy of the user model. Consequently, the agent learns more about the user, resulting in more personalized interactions. We demonstrate the effectiveness of our method in two areas: significantly improving personalization performance in conversational recommendation tasks and personalizing conversations to accommodate diverse learning styles in educational settings. Compared to traditional multi-turn RLHFs, it demonstrates improved generalization ability while maintaining conversational quality.

Takeaways, Limitations

Takeaways:
Leveraging user model-based curiosity-driven rewards to enhance LLM's personalization capabilities.
Delivering effective personalization to new users and in limited contexts.
Improving personalization performance in conversational recommendation and training environments.
Improved generalization ability compared to the existing RLHF method.
Limitations:
There is no specific mention of Limitations in the paper.
👍