[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models

Created by
  • Haebom

Author

Jiaji Zhang, Ruichao Sun, Hailiang Zhao, Jiaju Wu, Peng Chen, Hao Li, Xinkui Zhao, Kingsum Chow, Gang Xiong, Lin Ye, Shuiguang Deng

Outline

In this paper, we propose a novel quantization framework, SegQuant, based on Post-Training Quantization (PTQ), which can be applied to pre-trained models without retraining to solve the deployment problem of computationally expensive diffusion models. SegQuant combines the SegLinear strategy, which captures the semantics and spatial heterogeneity of the model structure, and the DualScale technique, which preserves the polar asymmetric activation, which is important for maintaining the visual fidelity of the generated results. It is applicable to various models, ensures compatibility with major deployment tools, and improves performance. It is characterized by overcoming the limitations of existing PTQ methods that depend on a specific architecture and increasing generality.

Takeaways, Limitations

Takeaways:
We present a new quantization framework that overcomes the model specificity of the existing PTQ method, Limitations, and enhances generality.
Achieving model size and computational cost reduction without compromising the performance of diffusion models through SegLinear and DualScale techniques.
Ensures compatibility with major deployment tools, allowing easy integration into industrial deployment pipelines.
Applicable to various diffusion models as well as Transformer-based models.
Limitations:
Further experiments are needed to determine how well SegQuant's performance generalizes to different diffusion models and datasets.
A deeper analysis of hyperparameter optimization of SegLinear and DualScale is needed.
Lack of comparative analysis with other advanced PTQ methods.
👍