This paper focuses on the task of Scene Recognition (SR) in computer vision, which extracts structured semantic summaries from images. Unlike existing SR methods, which treat verb classification as a single-label problem, this paper addresses the ambiguity that a single image can be described by multiple verb categories. To address this issue, we redefine verb classification as a multi-label problem, specifically, the Single Positive Multi-Label Learning (SPMLL) problem. Given the challenges of achieving complete multi-label annotation for large-scale datasets, we develop Graph Enhanced Verb Multilayer Perceptron (GE-VerbMLP), which utilizes graph neural networks to capture label correlations and optimizes decision boundaries through adversarial training. Extensive experiments on real-world datasets demonstrate that the proposed method achieves over 3% improvement in Mean Average Precision (MAP) while maintaining competitiveness in conventional top-1 and top-5 accuracy metrics. Furthermore, we present a comprehensive multi-label evaluation benchmark to fairly evaluate model performance in multi-label settings.