Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals

Created by
  • Haebom

Author

Jia-Nan Li, Jian Guan, Wei Wu, Rui Yan

Outline

This paper studies the inductive reasoning ability of large-scale language models (LLMs), especially the user preference inference, rather than the deductive reasoning ability. Capturing users’ diverse preferences in the alignment task of LLMs is a challenging problem because user preferences are implicitly included in various interaction forms. In this paper, we propose the AlignXplore model, which enables systematic preference inference from behavioral signals of user interaction history by utilizing an extended inference chain. AlignXplore is developed by combining cold start learning based on synthetic data and online reinforcement learning, and shows an average performance improvement of 15.49% over the existing models. In addition, we present the optimal case for preference inference learning through a systematic comparison of reward modeling strategies, and reveal the emergence of human-like inductive inference patterns during the training process.

Takeaways, Limitations

Takeaways:
A new approach to improving LLM's inductive reasoning ability
An effective solution to the user preference inference problem (AlignXplore model)
Efficient streaming inference and iterative improvement of preferences possible
Demonstrate strong generalization across a variety of input formats and submodels
Presenting the best case for preference inference learning
Observation of human-like inductive reasoning patterns during training
Limitations:
Reliance on synthetic data (cold start learning)
Additional validation of generalization performance on real user data is needed.
There is a possibility of bias towards certain types of user interactions.
Further research is needed on the interpretability of the model's inference process.
👍