Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Interact-Custom: Customized Human Object Interaction Image Generation

Created by
  • Haebom

Author

Zhu Xu, Zhaowen Wang, Yuxin Peng, Yang Liu

Outline

This paper focuses on synthetic, personalized image generation, which combines multiple concepts to generate images. Existing research has primarily focused on preserving the appearance of target objects, but has overlooked the fine-grained control of interactions between them. This paper proposes a challenge called "Custom Human-Object Interaction Image Generation" (CHOI), focusing on human-object interaction scenarios. CHOI requires both identity preservation of target humans and objects and control of the interaction semantics between them. The key challenges of CHOI are: (1) simultaneous identity preservation and interaction control require decomposing humans and objects into self-contained identity features and pose-based interaction features. However, existing HOI image datasets do not provide ideal samples for learning this feature decomposition; and (2) inappropriate spatial configurations between humans and objects can result in a lack of desired interaction semantics. To address this, we design a two-stage model, Interact-Custom, by processing a large-scale dataset containing samples of identical human-object pairs with different interaction poses. Interact-Custom first explicitly models the spatial configuration by generating a foreground mask depicting the interaction behavior. It then generates target humans and objects that interact while preserving their identity characteristics, guided by this mask. Interact-Custom also provides an optional feature to specify the union of the background image and the target human-object locations, providing a high level of content control. Extensive experiments on custom metrics for the CHOI task demonstrate the effectiveness of the proposed approach.

Takeaways, Limitations

Takeaways:
We present CHOI, a new challenge in the field of human-object interaction image generation, and propose Interact-Custom, an effective model for it.
Development of technology that simultaneously achieves identity preservation and interaction control of humans and objects.
Provides users with high content control.
Presenting an effective learning strategy using large-scale datasets.
Limitations:
Performance evaluation of the proposed model may depend on specific metrics.
Generalization performance for various types of human-object interactions requires further validation.
We built a new dataset to overcome the limitations of the existing HOI image dataset, but there may be limitations in the scale and diversity of the dataset.
May have limited ability to handle complex and diverse interaction scenarios.
👍