[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

Created by
  • Haebom

Author

Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin, Mohit Bansal

Outline

In this paper, we propose a novel mixed-precision post-learning quantization technique, Task-Circuit Quantization (TaCQ), to address the performance degradation problem in low-bit (2-3 bit) quantization. TaCQ works by directly conditioning the quantization process on the weight circuit, which is a set of weights related to the performance of a specific task. The weights that are important for the performance of a specific task are kept as 16 bits, and the remaining weights are quantized, thereby efficiently reducing memory usage while minimizing performance degradation. We utilize gradient information to predict the weight changes due to quantization and their impact on task performance, and experimentally demonstrate that it outperforms existing methods on a variety of tasks (QA, mathematical reasoning, text-to-SQL) and models (Llama-3, Qwen2.5) using both general-purpose and task-specific data. In particular, it achieves significant performance improvements over existing state-of-the-art methods in 2-bit and 3-bit quantization environments.

Takeaways, Limitations

Takeaways:
A novel mixed-precision quantization technique TaCQ is presented to effectively solve the performance degradation problem in low-bit quantization.
Minimize the impact on task performance by preserving task-specific weights.
Demonstrates superior performance over existing methods on large-scale language models such as Llama-3 and Qwen2.5 (especially in 2-3 bit quantization)
It shows performance improvement even without using task-specific data, and is effective even in general situations.
High performance even at low bit counts (3.1 bits) (96% performance for Llama-3-8B-Instruct)
Limitations:
The effectiveness of TaCQ may vary across specific tasks and models. Additional experiments with different models and tasks are needed.
Further research is needed on how to define weighted circuits and criteria for selecting important weights.
The effect of reducing memory usage depends on the ratio of keeping the importance weights to 16 bits. Further research is needed to determine the optimal ratio.
👍