Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

Created by
  • Haebom

Author

Yanxu Meng, Haoning Wu, Ya Zhang, Weidi Xie

Outline

SceneGen is a novel framework that simultaneously generates multiple 3D assets with geometric and texture information, taking a single scene image and its corresponding object masks as input. It operates without optimization or asset search and introduces a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders to generate 3D assets and their relative spatial positions in a single feedforward pass. Although trained on a single image input, it is directly scalable to multi-image input scenarios, and quantitative and qualitative evaluations demonstrate its efficiency and robust generation capabilities. It presents a novel solution to the emerging problem of 3D content generation for applications in VR/AR and implemented AI.

Takeaways, Limitations

Takeaways:
We present a new method for efficiently generating multiple 3D assets from a single image without optimization or asset search.
Enables accurate 3D asset creation and spatial location prediction through a feature aggregation module that integrates local and global information.
Provides direct scalability to multi-image inputs, enabling improved performance.
It presents a new paradigm for creating high-quality 3D content, which can be applied to various fields such as VR/AR and embodied AI.
Limitations:
The paper does not specifically mention Limitations. Further research may be needed to further evaluate the generalization performance for different types of scenes or complex objects, as well as the accuracy and level of detail of the generated 3D models.
👍