[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Created by
  • Haebom

Author

Omer Luxembourg, Haim Permuter, Eliya Nachmani

Outline

Masked Diffusion Language Models (MDLMs) promise fast, non-autoregressive text generation, but existing samplers reduce to slow autoregressive behaviors by ignoring interactions when unmasking multiple positions in parallel, based on the model’s confidence level. In this paper, we propose a diluted unmask scheduler (DUS). DUS partitions sequence positions into non-adjacent diluted groups in an inference-only, planner-model-free manner and unmasks them in parallel so as to minimize an upper bound on the joint entropy gain at each denoising step. By making the trade-off between the number of network calls and the quality of the generation explicit, DUS recovers most of the performance lost from existing parallel unmasking strategies. On math (GSM8K, MATH500), code (HumanEval, MBPP), and general knowledge benchmarks (BBH, MMLU-Pro), DUS outperforms confidence-based planners without modifying the underlying denoiser, demonstrating the real speed-quality frontier of MDLM.

Takeaways, Limitations

Takeaways: The Diluted Unmask Scheduler (DUS) overcomes the limitations of existing parallel unmask strategies and significantly improves the speed and quality of masked diffusion language models. It outperforms the confidence-based planner on various benchmarks and introduces a new speed-quality frontier for MDLM. It is noteworthy that the performance improvement is achieved without modifying the underlying denoiser.
Limitations: Although this paper has verified the performance of DUS on various benchmarks, further research is needed to determine whether it can be generalized to all types of text generation tasks. In addition, the lack of detailed analysis on parameter settings or optimization of DUS may make it difficult to apply in practice. Further research is also needed to determine whether DUS can be applied to all MDLMs and whether it depends on a specific model architecture.
👍