Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

HAVIR: Hierarchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

Created by
  • Haebom

Author

Shiyi Zhang, Dong Liang, Hairong Zheng, Yihang Zhou

The HAVIR Model: Reconstructing Visual Information from Brain Activity

Outline

This paper proposes HAVIR, a novel model for reconstructing visual information from brain activity. Inspired by the hierarchical representation theory of the visual cortex, HAVIR divides the visual cortex into two hierarchical regions and extracts distinct features from each region. Specifically, the Structural Generator extracts structural information from spatially processed voxels and transforms it into a latent diffusion dictionary, while the Semantic Extractor transforms semantically processed voxels into a CLIP embedding. These are integrated through a Versatile Diffusion model to synthesize the final image. Experimental results demonstrate that HAVIR improves the quality of structural and semantic reconstruction even in complex scenes, outperforming existing models.

Takeaways, Limitations

Takeaways:
Breakthrough performance improvements in reconstructing visual information from brain activity.
Improving the efficiency of complex visual stimulus reconstruction by mimicking the hierarchical structure of the visual cortex.
A novel approach is presented that separates and processes structural and semantic information.
Limitations:
The specific Limitations of the paper is not specified (not included in the abstract).
Further research is needed to determine the model's generalizability and applicability to various visual stimuli.
The computational cost and potential difficulty in model training due to the complexity of the HAVIR model.
👍