This paper studies the inductive reasoning ability of large-scale language models (LLMs), especially the user preference inference, rather than the deductive reasoning ability. Capturing users’ diverse preferences in the alignment task of LLMs is a challenging problem because user preferences are implicitly included in various interaction forms. In this paper, we propose the AlignXplore model, which enables systematic preference inference from behavioral signals of user interaction history by utilizing an extended inference chain. AlignXplore is developed by combining cold start learning based on synthetic data and online reinforcement learning, and shows an average performance improvement of 15.49% over the existing models. In addition, we present the optimal case for preference inference learning through a systematic comparison of reward modeling strategies, and reveal the emergence of human-like inductive inference patterns during the training process.