Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Multimodal Fusion SLAM with Fourier Attention

Created by
  • Haebom

Author

Youjie Zhou, Guofeng Mei, Yiming Wang, Yi Wan, Fabio Poiesi

Outline

In this paper, we propose an efficient multimodal fusion SLAM method, FMF-SLAM, to solve the visual simultaneous localization and mapping (SLAM) problem in challenging environments such as noise, changing illumination conditions, and dark environments. FMF-SLAM improves the algorithm efficiency by utilizing the fast Fourier transform (FFT) and introduces novel Fourier-based self-attention and cross-attention mechanisms to extract features from RGB and depth signals. In addition, it enhances the interaction of multimodal features by incorporating multi-scale knowledge distillation between multimodals. Through the fusion with GNSS-RTK and global bundle adjustment, we demonstrate the feasibility of FMF-SLAM in real-world scenarios with real-time performance by integrating it into a security robot. We demonstrate the state-of-the-art performance in noise, changing illumination, and dark conditions through video sequences using TUM, TartanAir, and real-world datasets. The code and datasets are available at https://github.com/youjie-zhou/FMF-SLAM.git .

Takeaways, Limitations

Takeaways:
An efficient multi-modal fusion SLAM method based on FFT is presented.
Effective feature extraction from RGB and depth signals via Fourier-based self-attention and cross-attention mechanisms.
Performance enhancement through multi-scale knowledge distillation across multiple modes.
Real-time performance verification through integration with real robotic systems.
Excellent performance in noisy, changing light and dark environments.
Public code and datasets.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Further evaluation of robustness in various environments is needed.
A more detailed analysis of computational cost and memory usage is needed.
👍