Effective conversational agents, such as Large-Scale Language Models (LLMs), must personalize interactions to adapt to user preferences, personality traits, and attributes across diverse domains, including education and healthcare. Current methods prioritize usability and safety, but fall short in facilitating truly empathetic, adaptive, and personalized conversations. In this paper, we propose incorporating curiosity-based intrinsic rewards into multi-turn RLHF, leveraging user models. This novel reward mechanism encourages the LLM agent to actively infer user characteristics and optimize conversations to improve the accuracy of the user model. Consequently, the agent learns more about the user, resulting in more personalized interactions. We demonstrate the effectiveness of our method in two areas: significantly improving personalization performance in conversational recommendation tasks and personalizing conversations to accommodate diverse learning styles in educational settings. Compared to traditional multi-turn RLHFs, it demonstrates improved generalization ability while maintaining conversational quality.