Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Variational Reasoning for Language Models

Created by
  • Haebom

Author

Xiangxin Zhou, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, Tianyu Pang

Outline

This paper introduces a variational inference framework for language models that treats thought processes as latent variables and optimizes them through variational inference. Starting from the evidence lower bound (ELBO), we extend it to a multi-track objective function for tighter bounds and propose a forward-KL formulation that stabilizes the training of variational posterior probabilities. Furthermore, we demonstrate that binary reward RL, including rejection sampling fine-tuning and GRPO, can be interpreted as a localized forward-KL objective function. Here, implicit weighting by model accuracy naturally arises during the derivation process, revealing previously unrecognized biases toward easier questions. We empirically validate the proposed method on extensive inference tasks in the Qwen 2.5 and Qwen 3 family of models. Overall, this study presents a principled probabilistic perspective that integrates variational inference with RL-style methods and provides a stable objective function for enhancing the inference capabilities of language models.

Takeaways, Limitations

Takeaways:
We present a novel method to improve the inference ability of language models using a variational inference framework.
Extending ELBO to multiple tracking objective functions to obtain tighter bounds.
Proposal of the forward-KL formula for stabilizing variational posterior probability training.
Rejection sampling fine-tuning and binary reward RL interpreted as a local forward-KL objective function.
Discovering bias toward easy questions through implicit weighting based on model accuracy.
Empirical validation completed on a wide range of inference tasks across the Qwen 2.5 and Qwen 3 model families.
A principled probabilistic perspective integrating variational inference and RL-style methods is presented.
Limitations:
There is no specific mention of Limitations in the paper. (Based on summarized information only.)
👍