Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

Created by
  • Haebom

Author

Xuewen Liu, Zhikai Li, Minhao Jiang, Mengjuan Chen, Jianquan Li, Qingyi Gu

Outline

This paper focuses on model quantization, a promising method for acceleration and compression of diffusion models. Quantization-Aware Training (QAT) is essential because conventional Post-Training Quantization (PTQ) shows serious performance degradation in low-bit quantization. However, the wide range of diffusion models and the time-varying activation functions increase the complexity of quantization, which reduces the efficiency of conventional QAT methods. In this paper, we propose a novel QAT framework, DilateQuant, to solve these problems. DilateQuant reduces the quantization error and ensures model convergence by reducing the range of activation functions while maintaining the original weight range by expanding the non-saturated input channel weights to a limited range through Weight Dilation (WD). Furthermore, we introduce Temporal Parallel Quantizer (TPQ) to address the time-varying activation functions, and Block-wise Knowledge Distillation (BKD) to reduce training resource consumption. Experimental results show that DilateQuant outperforms conventional methods in terms of accuracy and efficiency.

Takeaways, Limitations

Takeaways:
We present DilateQuant, a novel QAT framework for efficient quantization of diffusion models.
Reduce quantization error and improve model convergence by reducing the range of activation functions and preserving the original weight distribution through Weight Dilation (WD).
Improved accuracy and efficiency with Temporal Parallel Quantizer (TPQ) and Block-wise Knowledge Distillation (BKD).
Achieving greater accuracy and efficiency than existing methods.
Limitations:
The performance improvements of DilateQuant may be limited to specific diffusion models or datasets.
Further research is needed on hyperparameter optimization of WD, TPQ, and BKD.
A more in-depth comparative analysis with other quantization techniques may be needed.
👍