In this paper, we present a novel method for improving personalized interactions of conversational agents based on large-scale language models (LLMs). Existing reinforcement learning-based methods (RLHFs) focus on usability and safety, but they fall short in generating empathic, adaptive, and personalized conversations. In this paper, we propose a method that integrates curiosity-based intrinsic rewards into multi-round RLHF based on user models. This allows the LLM agent to actively infer user characteristics and optimize conversations to improve the accuracy of the user model, thereby providing more personalized interactions. Through experiments in conversational recommendation and training environments, we demonstrate improved personalization and generalization performance compared to existing RLHFs while maintaining conversation quality.