Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

Created by
  • Haebom

Author

Suchisrit Gangopadhyay, Jung-Hee Kim, Xien Chen, Patrick Rim, Hyoungseob Park, Alex Wong

Outline

This paper proposes a method for adapting basic monocular depth estimators (FMDEs), trained on conventional perspective images, to fisheye images. Despite being trained on tens of millions of images, FMDEs are susceptible to covariate shift due to changes in camera calibration (intrinsic and distortion) parameters, resulting in incorrect depth estimates. Our proposed method aligns the distribution of latent embeddings encoding fisheye images with those of perspective images, enabling the reuse of FMDEs on fisheye cameras without retraining or fine-tuning. To achieve this, we introduce a set of calibration tokens as a lightweight adaptive mechanism that adjusts the latent embeddings to achieve alignment. We hypothesize that by leveraging the already expressive latent space of FMDEs, we can avoid the negative effects of conventional recalibration or map projection from image space to a standard reference frame. Our method utilizes self-supervised learning and utilizes a large, publicly available perspective image dataset without requiring fisheye images. This is accomplished by recalibrating perspective images to fisheye images and enhancing consistency between estimates during training. We evaluated the approach in both indoor and outdoor environments using multiple FMDEs, demonstrating consistent performance improvements over state-of-the-art methods with just a single token set. The code is available at https://github.com/JungHeeKim29/calibration-token .

Takeaways, Limitations

Takeaways:
By making the existing monocular depth estimation model applicable to fisheye images, it is possible to expand various application fields utilizing fisheye cameras.
Adaptability to fisheye images using lightweight correction tokens without retraining or fine-tuning.
Achieving efficient adaptation and artifact reduction through latent space manipulation without image space transformation.
Self-supervised learning method allows learning without a fisheye image dataset.
Limitations:
Further research is needed to determine the generalization performance of the correction token. Versatility across various fisheye camera models and distortion levels is also needed.
The performance of the proposed method may depend on the FMDEs and perspective image datasets used.
Additional performance evaluation using real fisheye image datasets may be required.
👍