[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors

Created by
  • Haebom

Author

Chuheng Wei, Ziye Qin, Walter Zimmer, Guoyuan Wu, Matthew J. Barth

Outline

In this paper, we propose an integrated framework called HeCoFuse to address the challenges of real-world vehicle-to-everything (V2X) cooperative perception systems operating in heterogeneous sensor configurations. HeCoFuse is designed for cooperative perception in diverse sensor setups, including nodes that use both cameras (C) and lidars (L). We introduce a hierarchical fusion mechanism that adaptively weights features via a combination of channel-wise and spatial attention to address issues such as misalignment and imbalanced representation quality of multi-modality features. In addition, we employ an adaptive spatial resolution adjustment module to balance computational cost and fusion efficiency. To enhance the robustness against diverse configurations, we implement a collaborative learning strategy that dynamically adjusts the type of fusion depending on the available modalities. Experimental results on the real-world TUMTraf-V2X dataset show that HeCoFuse achieves 43.22% 3D mAP for all sensor configurations (LC+LC), outperforming the CoopDet3D baseline by 1.17%, and reaches 43.38% 3D mAP in the L+LC scenario. It also ranks first in the CVPR 2025 DriveX challenge, maintaining 21.74% to 43.38% 3D mAP across nine heterogeneous sensor configurations.

Takeaways, Limitations

Takeaways:
We present HeCoFuse, an effective integrated framework for V2X cooperative perception in heterogeneous sensor configurations.
Achieve robust performance across a variety of sensor configurations through hierarchical fusion mechanisms and adaptive spatial resolution adjustment modules.
Achieved state-of-the-art performance on TUMTraf-V2X dataset and won 1st place at CVPR 2025 DriveX Challenge.
Demonstrates excellent robust performance in various sensor deployment environments.
Limitations:
Limited to performance evaluation on the TUMTraf-V2X dataset, generalization performance to other datasets requires additional validation.
There is a possibility that it may not perfectly reflect the complexity of the actual road environment.
Further research is needed on detailed analysis of computational costs and optimization strategies.
👍