Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Created by
  • Haebom

Author

Runpeng Yu, Qi Li, Xinchao Wang

Outline

This paper presents a systematic investigation of discrete diffusion language models (dLLMs) and discrete diffusion multimodal language models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token parallel decoding paradigm using full attention and denoising-based generation strategies. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic and response-sensitive recognition, which were previously difficult to achieve with AR models. Recently, many industrial-scale proprietary d(M)LLMs and many open-source academic d(M)LLMs have achieved performances comparable to autoregressive models while improving inference speeds by up to 10x. The advances in discrete diffusion LLMs and MLLMs have been driven primarily by advances in two areas. The first is the development of autoregressive LLMs and MLLMs, which have accumulated a vast amount of data, benchmarks, and underlying infrastructures for training and inference. The second area of contribution is the advancement of the underlying mathematical models of discrete diffusion. These advances have led to a surge in dLLM and dMLLM research in the early 2025s. This paper presents a comprehensive overview of the research in the dLLM and dMLLM area, tracing the historical development of dLLM and dMLLM, formalizing the underlying mathematical framework, and categorizing representative models. It also analyzes key techniques for training and inference, and summarizes emerging applications across language, vision-linguistic, and biological domains. Finally, it discusses future directions for research and deployment.

Takeaways, Limitations

Takeaways:
DLLM and dMLLM offer advantages over AR models, including parallel generation, fine-grained output control, and dynamic and response-sensitive recognition.
DLLM and dMLLM achieved up to 10x inference speedup compared to the AR model.
This paper provides a comprehensive overview of the historical development of dLLM and dMLLM, their mathematical frameworks, representative models, training and inference techniques, and various applications.
Presents future directions for dLLM and dMLLM research.
Limitations:
This paper focuses on a general overview rather than an in-depth analysis of specific models or applications.
There may be a lack of detailed discussion on the pros and cons of dLLM and dMLLM.
Suggestions for future research directions may not be specific.
👍