[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Understanding Reasoning in Thinking Language Models via Steering Vectors

Created by
  • Haebom

Author

Constantin Venhoff, Iv an Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda

Outline

In this paper, we present a novel method to control the inference process of a large-scale language model (LLM) with thinking capabilities. Through experiments on 500 tasks across 10 different categories using the DeepSeek-R1-Distill model, we identify several inference behaviors, including uncertainty representation, example generation for hypothesis testing, and backtracking during the inference process. We show that these behaviors are mediated linearly in the activation space of the model and can be controlled using steering vectors. This study provides a method to extract and apply these vectors to modulate specific aspects of the model’s inference process, such as backtracking tendency or uncertainty representation. We verify the consistency of our control method using three DeepSeek-R1-Distill models.

Takeaways, Limitations

Takeaways:
We present a novel method to control and manipulate the reasoning process of LLMs with thinking abilities.
Provides practical tools to identify and control the model's inference behavior (such as uncertainty representation, example generation, and backtracking).
Demonstrates consistent control performance across a variety of model architectures.
Improve your understanding and interpretation of the reasoning process.
Limitations:
As a method specialized for the DeepSeek-R1-Distill model, its generalizability to other LLM architectures requires further study.
A detailed description of the process of extracting and applying the steering vector may be lacking.
The 500 tasks cover a wide range of categories, but may not cover all types of reasoning tasks.
The range of controllable reasoning behavior may be limited.
👍