Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label Learning

Created by
  • Haebom

Author

Yiming Lin, Yuchen Niu, Shang Wang, Kaizhu Huang, Qiufeng Wang, Xiao-Bo Jin

Outline

This paper focuses on the task of Scene Recognition (SR) in computer vision, which extracts structured semantic summaries from images. Unlike existing SR methods, which treat verb classification as a single-label problem, this paper addresses the ambiguity that a single image can be described by multiple verb categories. To address this issue, we redefine verb classification as a multi-label problem, specifically, the Single Positive Multi-Label Learning (SPMLL) problem. Given the challenges of achieving complete multi-label annotation for large-scale datasets, we develop Graph Enhanced Verb Multilayer Perceptron (GE-VerbMLP), which utilizes graph neural networks to capture label correlations and optimizes decision boundaries through adversarial training. Extensive experiments on real-world datasets demonstrate that the proposed method achieves over 3% improvement in Mean Average Precision (MAP) while maintaining competitiveness in conventional top-1 and top-5 accuracy metrics. Furthermore, we present a comprehensive multi-label evaluation benchmark to fairly evaluate model performance in multi-label settings.

Takeaways, Limitations

Takeaways:
We present the importance of multi-label verb classification considering image ambiguity and propose a novel perspective called single-positive multi-label learning (SPMLL).
We achieved performance improvements on multi-label verb classification problems using the GE-VerbMLP model (over 3% MAP improvement).
We provide a new evaluation benchmark for multi-label setups.
Limitations:
The proposed SPMLL approach assumes the difficulty of achieving complete multi-label annotation on large-scale datasets. This difficulty in data annotation may still limit its practical application.
The performance improvements of the GE-VerbMLP model are likely limited to a specific dataset, and further research is needed to determine its generalization performance on other datasets or situations.
Further validation of the generality and versatility of the proposed evaluation benchmark is needed.
👍