This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Created by
Haebom
Author
Runpeng Yu, Qi Li, Xinchao Wang
Outline
This paper presents a systematic survey of the Discrete Diffusion Language Model (dLLM) and the Discrete Diffusion Multimodal Language Model (dMLLM). Unlike autoregressive (AR) models, dLLM and dMLLM employ a multi-token parallel decoding paradigm that utilizes full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic recognition—features previously difficult to achieve with AR models. Many industrial-scale proprietary d(M)LLMs and numerous open-source academic d(M)LLMs have demonstrated performance comparable to autoregressive models, with inference speeds up to an order of magnitude faster. These advancements position discrete diffusion models as promising alternatives to traditional autoregressive approaches for intelligence. This paper presents a comprehensive overview of research in the dLLM and dMLLM fields. We trace the historical development of dLLM and dMLLM, formalize their underlying mathematical framework, and categorize representative models. We also analyze core technologies for learning and inference, and summarize emerging applications in areas such as language, vision-language, and biology. Finally, we discuss future directions for research and deployment. Related papers can be found at https://github.com/LiQiiiii/Awesome-Discrete-Diffusion-LLM_MLLM .
We show that discrete diffusion models can achieve inference speeds up to 10 times faster than autoregressive models.
◦
It provides features that are difficult to achieve in autoregressive models, such as parallel generation, fine-grained output control, and dynamic recognition.
◦
It suggests potential applications in various fields (language, vision-linguistics, biology, etc.).
◦
Provides a systematic survey and classification of dLLM and dMLLM.
•
Limitations:
◦
The paper lacks specific references to Limitations or limitations.
◦
Detailed analysis of the performance comparison of the presented models may be lacking.
◦
Discussion of future research directions could be more specific.