Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporally-enhanced Relation-aware Knowledge Transferring

Created by
  • Haebom

Author

Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang liu

Outline

This paper addresses the problem of weakly supervised dynamic scene graph generation (WS-DSGG). Existing WS-DSGG methods generate pseudo-labels using pre-trained object detectors, which are unsuitable for dynamic relationship recognition scenarios and suffer from low accuracy. To address this issue, we propose a temporally-aware relationship-aware knowledge transfer (TRKT) method. TRKT consists of two main components: relationship-aware knowledge mining and a dual-stream fusion module. Relationship-aware knowledge mining uses object and relationship class decoders and an inter-frame attention augmentation strategy to generate attention maps robust to action recognition and motion blur. The dual-stream fusion module integrates the generated attention maps with external detection results to improve object localization accuracy and reliability. Experimental results on the Action Genome dataset demonstrate that TRKT achieves state-of-the-art performance.

Takeaways, Limitations

Takeaways:
A novel method is proposed to effectively address the inaccuracy problem of the external object detector of the existing WS-DSGG, Limitations.
Improving object detection and relationship prediction performance by simultaneously considering temporal and relationship information.
Achieving state-of-the-art performance on the Action Genome dataset.
Securing reproducibility and facilitating follow-up research through open code.
Limitations:
The performance of the proposed method is limited to the Action Genome dataset, and further verification is required to determine generalization performance on other datasets.
It does not completely remove the dependency on external object detectors (but it does contribute to improved performance).
Lack of analysis of computational cost and complexity.
👍