Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

Created by
  • Haebom

Author

Xiaoyang Zhang, Jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot

Outline

This paper proposes SGDFuse, a conditional diffusion model using the Segment Anything Model (SAM), to address the shortcomings of existing methods in infrared-visible image fusion (IVIF), including a lack of deep semantic understanding and artifacts and loss of detail during the fusion process. SGDFuse optimizes the fusion process through a conditional diffusion model, utilizing the high-quality semantic masks generated by the SAM as prior information. The two-step process involves first performing preliminary fusion of multimodal features, and then generating a coarse-to-fine denoising model based on the semantic masks from the SAM and the preliminary fused image. This ensures both semantic directionality and high-fidelity results. Experimental results demonstrate that SGDFuse achieves state-of-the-art performance in terms of subjective and objective evaluations, as well as applicability to subsequent tasks. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
We present a novel method for achieving semantically accurate and high-quality infrared and visible-light image fusion using SAM.
Detailed control of the fusion process and high-fidelity results guaranteed using a conditional diffusion model.
State-of-the-art performance verified in subjective and objective evaluations and applicability to follow-up work.
Ensuring reproducibility and extensibility through open source code.
Limitations:
May depend on the performance of SAM. Errors in mask generation in SAM may affect the fusion results.
Computational costs may be high. Due to the nature of the diffusion model, processing times may be long.
Generalization performance verification is needed for various environments and datasets.
👍