Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Diffusion Language Models Know the Answer Before Decoding

Created by
  • Haebom

Author

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Yi Liang, Soroush Vosoughi, Shiwei Liu

Outline

Diffusion Language Models (DLMs) offer parallel sequence generation and flexible token ordering, but their inference speed is slower than autoregressive models due to the cost of bidirectional attention and the numerous refinement steps required for high-quality output. This paper highlights a previously overlooked characteristic of DLMs: early answer convergence. In many cases, the correct answer can be internally identified even halfway through the final decoding step. Based on this observation, this paper proposes Prophet, a fast, training-free decoding paradigm that enables early commit decoding. Prophet dynamically determines whether to continue refinement or decode all remaining tokens at once based on the confidence difference between the top two prediction candidates. It seamlessly integrates with existing DLM implementations and requires no additional overhead or training. Experimental results on LLaDA-8B and Dream-7B across various tasks demonstrate that Prophet reduces the number of decoding steps by up to 3.4x while maintaining high generation quality. This reframes DLM decoding as the problem of deciding when to stop sampling, and shows that early decoding convergence is a simple yet powerful mechanism to accelerate DLM inference.

Takeaways, Limitations

Takeaways:
We present a novel method that significantly improves decoding speed by exploiting the early answer convergence phenomenon of DLMs.
An efficient method that can be integrated into existing DLM implementations without additional learning.
Maintain high generation quality while reducing the number of decoding steps by up to 3.4x.
A new perspective on accelerating DLM inference (reframed as a problem of deciding when to stop sampling).
Limitations:
The effectiveness of the proposed method may vary depending on the DLM model and task used.
Further research is needed to explore the potential optimization of early termination decision-making methods based on confidence differences.
May only be applicable to certain types of DLM.
More extensive experimentation with different models and tasks is needed.
👍