Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models

Created by
  • Haebom

Author

Pengcheng Huang, Shuhao Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Zulong Chen, Tong Xiao

Outline

This paper focuses on the decoding strategy of the Mask Diffusion Model (MDM), pointing out the shortcomings of existing uncertainty-based sampling methods and proposing an improved decoding strategy, Position-Aware Confidence-Calibrated Sampling (PC-Sampler). PC-Sampler integrates global trajectory planning and content-aware information maximization, regulating decoding paths with a position-aware weighting mechanism and suppressing the premature selection of trivial tokens with calibrated confidence scores. Through extensive experiments on three advanced MDMs across seven benchmarks (including logical reasoning and planning tasks), we demonstrate that PC-Sampler outperforms existing MDM decoding strategies by an average of 10% and significantly reduces the performance gap with state-of-the-art autoregressive models.

Takeaways, Limitations

Takeaways:
We clearly present the shortcomings of the existing MDM decoding strategy (lack of global trajectory control, bias toward trivial tokens in the early stage) and propose a new decoding strategy, PC-Sampler, to improve them.
PC-Sampler demonstrates over 10% performance improvement over existing methods in various benchmarks, significantly reducing the performance gap with autoregressive models.
We propose an effective decoding strategy design using a position-aware weighting mechanism and a calibrated confidence score.
It also performs well in complex tasks such as logical reasoning and planning.
Limitations:
The performance improvements of the proposed PC-Sampler may be limited to specific MDMs and benchmarks.
Lack of analysis of the computational cost and complexity of PC-Sampler.
Further research is needed on its applicability to other types of sequence generation models.
👍