Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HAVIR: Hierarchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

Created by
  • Haebom

Author

Shiyi Zhang, Dong Liang, Hairong Zheng, Yihang Zhou

Outline

This paper deals with the study of reconstructing visual information from brain activity. Studies on decoding images using generative models using fMRI have been conducted, but it has been difficult to accurately restore highly complex visual stimuli. This is due to the density and diversity of elements in the stimulus, the elaborate spatial structure, and the multifaceted semantic information. To solve this problem, this paper proposes the HAVIR model, which includes two adapters. The AutoKL adapter transforms fMRI voxels into a latent diffusion dictionary that captures the topological structure, and the CLIP adapter transforms voxels into CLIP text and image embeddings that contain semantic information. These complementary representations are fused by Versatile Diffusion to generate the final reconstructed image. To extract the most important semantic information in complex scenarios, the CLIP adapter is trained using text captions describing the visual stimulus and the semantic images synthesized with the captions. The experimental results show that HAVIR effectively reconstructs the structural features and semantic information of visual stimuli even in complex scenarios, and outperforms existing models.

Takeaways, Limitations

Takeaways:
A novel method for accurately reconstructing complex visual stimuli from fMRI data
Improved performance with complementary expression fusion via AutoKL and CLIP adapters
Effectively restores both structural features and semantic information of complex visual information
Demonstrated superior performance compared to existing models
Limitations:
Further research is needed on the generalization performance of the HAVIR model
Need for performance evaluation on various types of fMRI data
Further validation is needed to determine whether this is an accurate match to actual visual experience.
Need to analyze the computational complexity and efficiency of the model
👍