[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Convergent transformations of visual representation in brains and models

Created by
  • Haebom

Author

Pablo Marcos-Manch on, Llu is Fuentemilla

Outline

This paper addresses a fundamental question in cognitive neuroscience: whether the structure of the external world or the internal structure of the brain shapes visual perception. Given that brain responses to natural stimuli elicit similar activity patterns across individuals, we examine whether convergence driven by such stimuli in the transformation from sensory representations to high-level internal representations follows a common path in humans and deep neural networks (DNNs). By introducing a unified framework that combines cross-individual similarity and alignment with model hierarchy to track representational flow, we analyze three independent fMRI datasets and reveal that the cortical-wide network that is preserved across individuals consists of two pathways: a medial-ventral pathway for scene structure and a lateral-dorsal pathway tuned to social and biological content. This functional organization is captured by the hierarchical structure of visual DNNs but not by language models, enhancing the specificity of visual-to-semantic transformations. In conclusion, we show that convergent computational solutions to visual encoding in both human and artificial vision are driven by the structure of the external world.

Takeaways, Limitations

Takeaways:
We demonstrate the existence of convergent computational solutions for visual information processing in humans and deep neural networks (DNNs).
Elucidating the functional roles of the medial-ventral and lateral-dorsal pathways in visual scene recognition.
Demonstrating the usefulness of DNNs for semantic transformation of visual information.
Suggests that the structure of the external world plays an important role in the formation of visual perception.
Limitations:
Difficulty in generalization due to the specificity of the fMRI dataset used.
Difficulty in perfectly reproducing the human visual system due to limitations of DNN models.
Limited generalizability to other sensory modalities.
👍