[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception

Created by
  • Haebom

Author

Chuang Chen, Xiao Sun, Zhi Liu

Outline

In this paper, we propose a novel large-scale pre-training framework, UniEmoX, based on psychological theories to solve the generalization problem of visual sentiment analysis. UniEmoX integrates scene-centric and person-centric low-level image spatial structure information to derive more subtle and discriminative emotion expressions, and extracts rich semantic information from CLIP models to enhance emotion embedding representations. We also present a new emotion dataset, Emo8, which contains images of various styles (cartoon, nature, realistic, sci-fi, and advertising). Experimental results on multiple benchmark datasets demonstrate the effectiveness of UniEmoX.

Takeaways, Limitations

Takeaways:
A new visual emotion analysis framework using psychological theory is presented.
Improve generalization performance across a variety of scenarios with large-scale pre-training
New Emotion Dataset Emo8 Released
Learning more sophisticated emotion expressions through scene- and person-centric information integration
Improved semantic information utilization and emotion embedding through the use of CLIP models
Limitations:
Further review of the size and diversity of the Emo8 dataset is needed.
A more detailed analysis is needed to determine how well UniEmoX performs compared to other state-of-the-art models.
The possibility of bias toward certain emotions and the need to suggest solutions to address this
Further research is needed on performance evaluation and applicability in real-world applications.
👍