Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs

Created by
  • Haebom

Author

Chang Li, Yaren Zhang, Haoran Lv, Qiong Cao, Chao Xue, Xiaodong He

Outline

This paper presents a framework for efficient implicit inference of large-scale language models (LLMs). The conventional CoT prompting method has the disadvantage of being computationally expensive and slow, so in this paper, we propose a method to infer in the latent space without explicitly generating the computational process as text. To this end, we model the latent thought process as an abstract action (option) extended in time within a hierarchical reinforcement learning framework, and learn various options as latent embeddings using the variational Markov option critic (VMOC) algorithm. We extend the theory of continuous MDP isomorphism to prove that policy learning in the latent space preserves the optimal solution of the original complex problem, and propose a cold start procedure that distills human reasoning demonstrations into the latent option space using supervised fine-tuning (SFT) data. Experimental results on complex logical reasoning benchmarks and movement tasks demonstrate the effectiveness of the proposed framework.

Takeaways, Limitations

Takeaways:
A novel framework for efficient implicit inference in LLM
Proposing a latent space-based inference method to solve computational cost and speed problems
Learning effective latent thought processes using the variational Markov option critic (VMOC) algorithm
Establishing a theoretical foundation through extension of continuous MDP homomorphism theory
Proposal of a cold start procedure using Supervisory Fine Tuning (SFT) data
Demonstrated superior performance in logical reasoning and control tasks
Limitations:
Further research is needed on the generalization performance of the proposed framework.
Need to evaluate applicability and scalability to various types of problems
Need to improve learning stability and efficiency of VMOC algorithm
Research is needed to secure the interpretability and transparency of potential space.
👍