Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

Created by
  • Haebom

Author

Xiaoyang Zhang, Zhen Hua, Yakun Ju, Wei Zhou, Jun Liu, Alex C. Kot

Outline

This paper proposes SGDFuse, a conditional diffusion model using the Segment Anything Model (SAM), to address the shortcomings of existing methods in infrared-visible image fusion (IVIF), including a lack of deep semantic understanding, artifact generation, and loss of detail. SGDFuse optimizes the fusion process through a conditional diffusion model, leveraging the high-quality semantic masks generated by the SAM as explicit prior information. The two-step process involves first performing preliminary fusion of multimodal features, and then denoising the diffusion model from coarse to fine, conditioned on the semantic masks from the SAM and the preliminary fused image. This ensures semantic directionality and high fidelity of the final result. Experimental results demonstrate that SGDFuse achieves state-of-the-art performance in terms of subjective and objective evaluations, as well as applicability to downstream tasks. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
We demonstrate that SAM can be used to obtain semantically rich and high-quality infrared and visible-light image fusion results.
Effectively solves the problem of artifact generation and loss of detail in existing methods.
Excellent applicability to downstream tasks, high potential for practical applications.
Achieving cutting-edge performance.
Ensure reproducibility and extensibility through open source code.
Limitations:
May depend on SAM performance. SAM performance degradation may impact SGDFuse performance.
May be computationally expensive. Because it is based on a diffusion model, processing time may be long.
SAM may perform poorly on certain types of images, so there is a possibility of poor fusion performance on these images.
👍