Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros

Created by
  • Haebom

Author

Jacob Schrum, Olivia Kilday, Emilio Salas, Bess Hagan, Reid Williams

Outline

This paper presents a study utilizing a diffusion model to generate tile-based game levels based on text. Unlike previous studies that focused on unconditional level generation, this study focuses on generating levels from text input. To achieve this, we present a strategy for automatically assigning captions to existing datasets and a training method for the diffusion model using a pre-trained text encoder and a newly trained simple Transformer model. We evaluate the diversity and playability of the generated levels and compare them with existing unconditional diffusion models, generative adversarial networks (GANs), the Five-Dollar Model, and MarioGPT. Specifically, we demonstrate that the diffusion model using the simple Transformer model outperforms models using complex text encoders with a shorter training time, suggesting that relying on a large language model is unnecessary. Finally, we provide a GUI that allows users to construct longer levels using the generated level fragments.

Takeaways, Limitations

Takeaways:
Demonstrating the utility of diffusion models in text-based game level generation.
We demonstrate that it is possible to build effective text-level generative models without a large-scale language model.
Provides a GUI that allows you to create long levels by connecting generated level pieces.
Increase dataset utilization efficiency through automatic caption assignment strategies.
Limitations:
Lack of detailed description of the dataset used and the characteristics of the game genre.
The criteria for evaluating the playability of generated levels are unclear.
Lack of detailed description of the GUI's features and usability.
A more in-depth comparative analysis with other text-level generative models is needed.
👍