Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Understanding Reasoning in Thinking Language Models via Steering Vectors

Created by
  • Haebom

Author

Constantin Venhoff, Iv an Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda

Outline

In this paper, we present a novel method for controlling the inference process of a large-scale language model (LLM) with thinking capabilities. Through experiments on 500 tasks across 10 different categories using the DeepSeek-R1-Distill model, we demonstrate several inference behaviors, including uncertainty representation, example generation for hypothesis testing, and backtracking during the inference process. We show that these behaviors are mediated by linear directions in the activation space of the model and can be controlled using steering vectors. This study provides a method for controlling specific aspects of the inference process (e.g., backtracking tendency or uncertainty representation), and shows consistent control performance across three DeepSeek-R1-Distill models. This provides a practical tool for controlling the inference process of thinking models in a controllable and interpretable manner.

Takeaways, Limitations

Takeaways:
A novel method to control the reasoning process of LLMs with thinking ability is presented
Demonstrating the possibility of modulating inference behavior using linear directions within the model's activation space
Control of various inference behaviors such as uncertainty expression, hypothesis testing, and backtracking
Verify consistent control performance across a variety of model architectures
Provide practical tools to make the inference process controllable and interpretable
Limitations:
A method specialized for the DeepSeek-R1-Distill model, requiring verification of generalizability to other LLM models
The range of controllable reasoning behavior may be limited.
The complexity and computational cost of the steering vector extraction and application process must be considered.
The need for generalizability based on the results of a limited-scale experiment of 500 tasks
👍