Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation

Created by
  • Haebom

Author

Gautier Evennou, Antoine Chaffin, Vivien Chappelier, Ewa Kijak

Outline

As generative models have made image editing easier, Image Difference Captioning (IDC), which describes the differences between two images, has become increasingly important. While existing IDC models have been successful with 3D rendered images, they have struggled with real-world images due to a lack of training data and the difficulty of capturing subtle differences in complex images. To address these challenges, this paper proposes a simple yet effective framework that adapts existing image captioning models to the IDC task and augments the IDC dataset. Specifically, we develop the BLIP2IDC model, which adapts BLIP2 to the IDC task and demonstrates superior performance compared to existing approaches using two streams. Furthermore, we propose a novel Syned1 dataset, which enhances the performance of IDC models through synthetic data augmentation.

Takeaways, Limitations

Takeaways:
We improved the performance of IDC tasks through the BLIP2IDC model.
We present a novel approach to improving the performance of IDC models through synthetic data augmentation.
We have built a new dataset, Syned1, suitable for real-world IDC work.
Limitations:
The paper does not specifically state Limitations (not mentioned in the Abstract)
👍