Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing

Created by
  • Haebom

Author

Xiaolong Wang, Zhi-Qi Cheng, Jue Wang, Xiaojiang Peng

Outline

This paper proposes a novel multimodal fashion image editing architecture, the Detail-Preserving Diffusion Model (DPDEdit). Based on a latent diffusion model, DPDEdit integrates text prompts, region masks, human pose images, and clothing texture images to guide fashion image generation. Grounded-SAM is used to predict the editing region based on the user's textual descriptions, and local editing is performed by combining these with other criteria. To transfer the details of a given clothing texture to the target fashion image, we propose a texture infusion and enhancement mechanism that preserves high-frequency details of the generated clothing texture using a separate cross-attention layer and an auxiliary U-Net. Furthermore, we extend the VITON-HD dataset with a multimodal large-scale language model to generate paired samples of texture images and textual descriptions. Experimental results demonstrate that DPDEdit outperforms state-of-the-art methods in terms of image fidelity and consistency with the given multimodal input.

Takeaways, Limitations

Takeaways:
Effectively utilize multi-modal inputs (text, masks, poses, textures) to enable accurate and detailed fashion image editing.
Solving the problem of identifying edit regions and preserving texture details through Grounded-SAM and texture injection and enhancement mechanisms.
Expanding the VITON-HD dataset enables model training based on richer data.
Achieving cutting-edge performance.
Limitations:
Lack of analysis of the computational cost and processing time of the proposed method.
Further evaluation of generalization performance across different fashion styles and clothing types is needed.
Absence of actual user interface implementation and usability evaluation.
The limitations of Grounded-SAM due to its dependence on Grounded-SAM may also affect DPDEdit.
👍