In this paper, we propose an Ultra3D framework that improves the efficiency of 3D content generation using sparse volume cell representation. The conventional two-stage diffusion model suffers from serious computational inefficiency due to the quadratic complexity of the attention mechanism. Ultra3D efficiently generates object layouts in the first stage by utilizing the VecSet representation and accelerates volume cell coordinate prediction by reducing the number of tokens. In the second stage, a partial attention mechanism based on geometric recognition is introduced to restrict attention computation only within semantically consistent subregions, thereby maintaining structural continuity and avoiding unnecessary global attention. This achieves up to 6.7x speedup in latent variable generation, supports high-resolution 3D generation at 1024 resolution, and achieves state-of-the-art performance in terms of visual fidelity and user preference. In addition, we build a scalable partial annotation pipeline that transforms raw meshes into sparse volume cells with partial labels.