Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Steering When Necessary: Flexible Steering Large Language Models with Backtracking

Created by
  • Haebom

Author

Zifeng Cheng, Jinwei Gan, Zhiwei Jiang, Cong Wang, Yafeng Yin, Xiang Luo, Yuchen Fu, Qing Gu

Outline

Aligning the behavior of large-scale language models (LLMs) to the desired behavior remains a significant challenge. In this paper, we propose Activation Steering, a cost-effective method that directly adjusts LLM activations during the inference phase. To overcome the limitations of existing methods, we propose the Flexible Activation Steering with Backtracking (FASB) framework, which dynamically determines the need for and intensity of intervention by considering both the question and the generated content. FASB includes a backtracking mechanism to correct tokens that deviate from the desired behavior. Experimental results on the TruthfulQA dataset and six multiple-choice datasets demonstrate that the proposed method outperforms existing baselines.

Takeaways, Limitations

Takeaways:
Proposing an effective activation steering framework for aligning desired behaviors of LLM.
Dynamically determine the need and intensity of intervention by considering both the question and the generated content.
Correction of biased tokens through backtracking mechanism.
Demonstrated superior performance over existing baselines on various datasets.
Limitations:
With limited information, it is difficult to determine specific Limitations (e.g., computational cost, generalization to other tasks, performance analysis of specific backtracking mechanisms, etc.).
Experiments with additional datasets and evaluation metrics are needed.
👍