Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

BlobCtrl: Taming Controllable Blob for Element-level Image Editing

Created by
  • Haebom

Author

Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou

BlobCtrl: Element-Level Image Editing with Blob-Based Representation

Outline

BlobCtrl is a framework for precise manipulation of specific visual elements using a diffusion-based method. By treating blobs as visual primitives, it separates layout and appearance, enabling precise object-level manipulation. Key contributions include: (1) an in-context dual-branch diffusion model that explicitly separates layout and appearance by separating foreground and background processing and unifying blob representations; (2) a self-supervised split-and-reconstruct training paradigm with an identity-preserving loss function, and a tailored strategy for efficiently utilizing blob-image pairs. To facilitate research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experimental results demonstrate that BlobCtrl achieves state-of-the-art performance while maintaining computational efficiency across a variety of element-level editing operations, such as adding, removing, resizing, and replacing objects.

Takeaways, Limitations

Takeaways:
Object-level precision image editing possible by leveraging blob-based representation.
Increased image manipulation flexibility by separating layout and appearance.
Achieved SOTA in various editing operations such as adding, removing, resizing, and replacing objects.
Facilitating research with BlobData and BlobBench.
Limitations:
The specific Limitations is not explicitly mentioned in the paper. (However, since the paper states that it maintains computational efficiency, this part may not be Limitations.)
👍