Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

Created by
  • Haebom

Author

Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney

Outline

In this paper, we point out that the inference cost of large-scale language models (LLMs) is increasing, and propose a novel framework, directed stochastic skill search (DS3), that represents the inference process as a probabilistic search over a learned skill graph. DS3 provides analytical formulas for calculating the task success rate and computational cost for various inference strategies, such as chains of thought (CoT) and tree of thought (ToT), enabling comparative analysis with respect to task difficulty and model performance. By extending the ternary graph framework for LLM training to integrate inference, and connecting DS3 with experimental methods that characterize LLM scaling behavior, we theoretically reproduce experimentally observed patterns, such as linear accuracy scaling with logarithmic computational cost, variation of optimal inference strategies with task difficulty and model performance, emergent behavior exhibited by inference even when performance plateaus under parameter scaling, and best-of-N (BoN) and majority voting behaviors captured within the integrated analysis framework. By explicitly characterizing training-inference interdependencies, this framework deepens theoretical understanding and supports principled algorithm design and resource allocation.

Takeaways, Limitations

Takeaways:
A new theoretical framework (DS3) for efficient management of LLM inference costs is presented.
Analytical prediction of optimal inference strategies based on task difficulty and model performance.
Provides a deeper understanding of the interdependence of training and inference.
Contribute to principle-based algorithm design and resource allocation strategy development.
Theoretical explanation of experimentally observed LLM expansion behavior patterns.
Limitations:
Absence of application and performance evaluation results for real LLM of the DS3 framework.
The generalizability of the proposed analytical expression to the complexity of real LLMs needs to be verified.
Need to check generality for different types of LLMs and jobs.
Lack of specific guidelines for resource allocation strategies in real-world environments.
👍