Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DLLMQuant: Quantizing Diffusion-based Large Language Models

Created by
  • Haebom

Author

Chen Xu, Dawei Yang

Outline

This paper presents a quantization technique for efficiently building diffusion-based large-scale language models (DLLMs). Existing post-training quantization (PTQ) techniques, when applied to DLLMs, suffer from accuracy degradation and generalization degradation due to conflicts with core DLLM mechanisms such as dynamic masking, iterative generation, and bidirectional attention. Therefore, in this paper, we propose the DLLMQuant framework, which includes three novel techniques: TMAS, a compensation technique that considers temporal and mask factors; IA-AQ, which dynamically allocates quantization resources by leveraging the interaction signal of bidirectional attention; and CGQ, which utilizes mask states and token scores for error correction. Experimental results demonstrate that DLLMQuant achieves significant performance improvements along with improved efficiency.

Takeaways, Limitations

Takeaways:
We present DLLMQuant, a new PTQ framework for efficient DLLM construction.
Solving the problems of reduced accuracy and generalization performance that occur when applying DLLM to existing PTQs.
Effective quantization considering the characteristics of DLLM is achieved through three innovative techniques: TMAS, IA-AQ, and CGQ.
Experimental results confirm DLLMQuant's performance improvement and increased efficiency.
Limitations:
Further verification of the generalization performance of the proposed method is needed.
Applicability and performance analysis for various DLLM architectures and sizes is required.
A more detailed comparative analysis with other quantization techniques is needed.
Performance evaluation and stability verification in actual application environments are required.
👍