Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Created by
  • Haebom

Author

Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun

Outline

This paper presents the first systematic study of low-bit quantization of diffusion-based large-scale language models (dLLMs). Unlike autoregressive (AR) LLMs, dLLMs utilize full attention and denoising-based decoding strategies. However, their large parameter size and high resource requirements hinder their deployment on edge devices. This study uncovers the outlier problem in activation values in dLLMs and, using state-of-the-art PTQ techniques, performs a comprehensive evaluation across various aspects, including bit width, quantization method, task type, and model type. Through this, we aim to provide practical insights into the quantization behavior of dLLMs and lay the foundation for efficient dLLM deployment.

Takeaways, Limitations

Takeaways:
We present the first systematic study on low-bit quantization of dLLM.
Identifying the activation value outlier problem that occurs during the dLLM quantization process
Analysis of dLLM quantization performance in various aspects (bit width, quantization method, task type, model type).
Providing practical guidance for efficient dLLM deployment
Sharing research by making code and experimental setups public
Limitations:
The types of dLLM and quantization techniques covered in this study may be limited.
Performance evaluation in actual edge device deployment environments may be lacking.
Lack of optimization solutions for various hardware platforms
👍