Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

Created by
  • Haebom

Author

Senkang Hu, Xudong Han, Jinqi Jiang, Yihang Tao, Zihan Fang, Yong Dai, Sam Tak Wu Kwong, Yuguang Fang

Outline

To reduce the cost of applying large-scale language models to downstream tasks, this paper proposes a method that directly adjusts the output distribution during the decoding process, rather than updating the weights. We introduce Steering Vector Decoding (SVD), a lightweight PEFT-compatible method. After an initial warm-up fine-tuning, we extract task-specific steering vectors via the KL divergence gradient. These vectors are then used during the decoding process to approximate the model's output distribution to the task distribution. SVD is equivalent to a first-order approximation of full fine-tuning and provides a global optimum solution for steering vector strengths. Across various tasks and benchmarks, SVD, combined with existing PEFT methods, improves multiple-choice accuracy by up to 5 points, open-ended truthfulness by 2 points, and commonsense datasets by 1-2 points.

Takeaways, Limitations

Takeaways:
Improving the adaptability of large-scale language models to tasks in a lightweight manner.
Compatible with PEFT method, allowing for improved performance without additional parameters.
Securing the validity of the methodology through theoretical grounds.
Demonstrated superior performance across a variety of tasks and benchmarks.
Limitations:
Warm-start fine-tuning required.
Depending on the model and task, the optimal steering vector strength may need to be tuned.
The extent of improvement may not be the same for all tasks.
👍