Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

Created by
  • Haebom

Author

Jeongho Kim, Hoiyeong Jin, Sunghyun Park, Jaegul Choo

Outline

This paper addresses text-based virtual fitting tasks, building on recent virtual fitting approaches that leverage powerful generative capabilities by fine-tuning pre-trained text-to-image diffusion models. Specifically, we focus on the text-editable virtual fitting task, which modifies clothing based on provided clothing images and edits the wear style (e.g., tuck-in style, fit) based on text descriptions. To achieve this, we address three key challenges: (i) designing rich text descriptions for paired person-clothing data for model training; (ii) resolving conflicts where textual information about existing person clothing interferes with the generation of new clothing; and (iii) adaptively adjusting inpainting masks based on text descriptions to ensure appropriate editing areas while preserving the original person's appearance, which is unrelated to the new clothing. To address these challenges, we propose PromptDresser, a text-editable virtual fitting model that leverages the support of large-scale multimodal models (LMMs) to enable high-quality, versatile manipulations based on text prompts. PromptDresser utilizes LMMs through in-context learning to generate detailed text descriptions of person and clothing images, including detailed information and editing attributes, with minimal human intervention. Additionally, the inpainting mask adaptively adjusts based on text prompts to ensure the editing area is secure. Experimental results demonstrate that PromptDresser outperforms existing methods, demonstrating excellent text-based control and diverse garment manipulation.

Takeaways, Limitations

Takeaways:
We present a new virtual fitting model that allows for fine-tuning of clothing style and fit based on clothing images using text prompts.
Enables high-quality, diverse garment manipulation using large-scale multimodal models (LMMs).
Automatically generate rich text descriptions with minimal human effort through in-context learning.
Improve image quality by effectively conveying clothing details that are difficult to capture with images alone.
It shows better performance than existing methods.
Limitations:
The possibility that the performance evaluation of the proposed model may be limited to a specific dataset.
Further research is needed to determine generalizability across different clothing types and body types.
Potential for errors due to ambiguity or misinterpretation of text prompts.
Because it is highly dependent on LMM, its performance may be affected.
👍