Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement

Created by
  • Haebom

Author

Anyi Wang, Xuansheng Wu, Dong Shu, Yunpu Ma, Ninghao Liu

SAE-RSV: Steering Vector Refinement Using Sparse Autoencoders

Outline

This paper focuses on steering, a promising approach for controlling the parameters of large-scale language models (LLMs) without modifying them. Existing steering methods rely on large datasets to learn clear action information, but steering vectors learned from small datasets often contain task-irrelevant noisy features, resulting in ineffectiveness. To address this, this paper proposes SAE-RSV (Refinement of Steering Vector via Sparse Autoencoder), which semantically denoises and augments steering vectors using a sparse autoencoder (SAE). SAE removes task-irrelevant features and augments missing task-relevant features from small datasets based on semantic similarity with identified relevant features. Experimental results demonstrate that the proposed SAE-RSV outperforms all baseline methods, including supervised learning-based fine-tuning. We demonstrate that refining the original steering vector via SAE allows for the construction of effective steering vectors from limited training data.

Takeaways, Limitations

Takeaways:
A new methodology is presented that enables effective LLM control even with small datasets.
Noise removal and enhancement of steering vectors using SAE
Demonstrated superior performance compared to existing methods
Limitations:
Lack of information on specific SAE structures and hyperparameters.
Further research is needed to determine the generalizability of the proposed method.
Lack of consideration of the computational cost of SAE-based refining processes.
👍