This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Created by
Haebom
Author
Yuxiao Wang, Yu Lei, Zhenao Wei, Weiying Xue, Xinyu Jiang, Nan Zhuang, Qi Liu
Outline
In this paper, we propose a novel framework for human-object contact (HOT) detection, P3HOT. P3HOT combines prompt guidance and human proximity recognition to guide the network's attention to relevant regions based on the correlation between images and texts, and effectively removes regions where interactions are not expected using learnable parameters. It utilizes depth information to resolve the uncertainty of overlap between people and objects in 2D perspectives and provides a quasi-3D perspective, and introduces a region-specific joint loss (RJLoss) to suppress abnormal categories within the same region. In addition, we propose a new evaluation metric, "AD-Acc.", to address the shortcomings of existing methods. Experimental results show that it achieves state-of-the-art performance in all four metrics on two benchmark datasets. In particular, on the HOT-Annotated dataset, it achieves improvements of 0.7, 2.0, 1.6, and 11.0 in SC-Acc., mIoU, wIoU, and AD-Acc. metrics, respectively. The source code can be found at https://github.com/YuxiaoWang-AI/P3HOT .