Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Scaffold Diffusion: Sparse Multi-Category Voxel Structure Generation with Discrete Diffusion

Created by
  • Haebom

Author

Justin Jung

Outline

This paper proposes a generative model called Scaffold Diffusion to address the challenges of generating sparse multi-category 3D voxel structures, specifically the severe class imbalance caused by the cubic memory scaling and sparsity of the voxel structures. Scaffold Diffusion treats voxels as tokens and generates 3D voxel structures using a discrete diffusion language model. We demonstrate that this model can be extended to generate spatially coherent 3D structures beyond inherently sequential domains such as text. Through evaluations on Minecraft house structures from the 3D-Craft dataset, we demonstrate that Scaffold Diffusion, unlike existing baseline models and autoregressive formulations, generates realistic and consistent structures even when trained with data with >98% sparsity. We also provide an interactive viewer to visualize the generated samples and the generation process ( https://scaffold.deepexploration.org/ ).

Takeaways, Limitations

Takeaways:
We present an effective new method for generating sparse multi-category 3D voxel structures.
Demonstrating the possibility of extending the discrete diffusion language model to generate spatial structures.
Generate realistic and consistent 3D structures even from data with sparsity greater than 98%.
Provides an interactive viewer for visualizing the creation process.
Limitations:
Further research is needed to determine the generalization performance of the proposed method to other types of 3D data or more complex structures.
Only the evaluation results for the 3D-Craft dataset are presented, so verification of generalization performance on other datasets is necessary.
Rather than a concrete solution to the memory scaling problem, we adopt a roundabout approach using a discrete diffusion language model.
👍