Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Created by
  • Haebom

Author

Yanming Wan, Jiaxing Wu, Marwa Abdulhai, Lior Shani, Natasha Jaques

Outline

In this paper, we present a novel method for improving personalized interactions of conversational agents based on large-scale language models (LLMs). Existing reinforcement learning-based methods (RLHFs) focus on usability and safety, but they fall short in generating empathic, adaptive, and personalized conversations. In this paper, we propose a method that integrates curiosity-based intrinsic rewards into multi-round RLHF based on user models. This allows the LLM agent to actively infer user characteristics and optimize conversations to improve the accuracy of the user model, thereby providing more personalized interactions. Through experiments in conversational recommendation and training environments, we demonstrate improved personalization and generalization performance compared to existing RLHFs while maintaining conversation quality.

Takeaways, Limitations

Takeaways:
We present a novel method to improve the level of personalization of LLM-based conversational agents by leveraging user models and curiosity-based rewards.
Overcoming the limitations of existing RLHF and achieving effective personalization even with limited user information.
Experimentally verifying improved personalization and generalization performance in conversational recommendation and training.
Suggests the potential for developing more empathetic, adaptive, and immersive conversational agents.
Limitations:
The effectiveness of the proposed method has been verified only in specific domains (conversational recommendation, education), and further research is needed on its generalizability to other domains.
Because of the high dependence on the accuracy of the user model, errors in the user model can affect the agent's performance.
Further research may be needed on the design and tuning of curiosity-based rewards.
Extensive experimentation and evaluation across a variety of user types and characteristics is required.
👍