Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary

Created by
  • Haebom

Author

Zeqi Gu, Yin Cui, Zhaoshuo Li, Fangyin Wei, Yunhao Ge, Jinwei Gu, Ming-Yu Liu, Abe Davis, Yifan Ding

Outline

In this paper, we propose a novel pipeline, ArtiScene, which utilizes a text-to-image model to solve the difficulty of text-based 3D scene generation. To solve the problem of lack of high-quality 3D data in existing text-to-3D models, we first generate a 2D image with text input, and then generate a 3D model using the shape, appearance, and location information of objects extracted from the image, and assemble the final 3D scene. ArtiScene can generate various scenes and styles, and outperforms existing state-of-the-art methods in quantitative indicators, user studies, and GPT-4 evaluations.

Takeaways, Limitations

Takeaways:
By leveraging the superior performance of the text-image model, we have greatly improved the efficiency and quality of 3D scene generation.
It provides an automated pipeline that can generate a variety of styles and scenes without separate training.
We verified its performance through quantitative indicators and user research, and showed superior results compared to existing methods.
Limitations:
There is a possibility of loss of accuracy during the process of extracting 3D information from 2D images.
It may be difficult to accurately represent complex 3D scenes or subtle interactions.
It may depend on the performance of the text-to-image model used.
👍