In this paper, we present PLUS (Preference Learning Using Summarization), a novel framework for personalizing responses of large language models (LLMs) based on user preferences and goals. While traditional RLHF (Reinforcement Learning from Human Feedback) trains all users with a single reward model, which fails to account for user-to-user variability, PLUS learns text-based summaries that summarize each user’s preferences, features, and past conversations. These summaries condition the reward model to enable personalized predictions of response types that each user considers important. We create an online co-adaptation loop that trains user summary models and simultaneously updates the reward model through reinforcement learning. We demonstrate that PLUS is robust to new users and diverse conversation topics on diverse user datasets, and that the generated user summaries can be transferred to zero-shot personalization of powerful proprietary models such as GPT-4. As a result, the generated user summaries are concise and portable, as well as easy for users to interpret and modify, enhancing transparency and user control over LLM alignment.