This paper proposes HAVIR, a novel model for reconstructing visual information from brain activity. Inspired by the hierarchical representation theory of the visual cortex, HAVIR divides the visual cortex into two hierarchical regions and extracts distinct features from each region. Specifically, the Structural Generator extracts structural information from spatially processed voxels and transforms it into a latent diffusion dictionary, while the Semantic Extractor transforms semantically processed voxels into a CLIP embedding. These are integrated through a Versatile Diffusion model to synthesize the final image. Experimental results demonstrate that HAVIR improves the quality of structural and semantic reconstruction even in complex scenes, outperforming existing models.