Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Created by
  • Haebom

Author

Donald Shenaj, Ondrej Bohdal, Mete Ozay, Pietro Zanuttigh, Umberto Michieli

Outline

This paper discusses the advancement of a state-of-the-art image generation model that enables personalized image generation with both user-defined subject matter (content) and style. Previous research achieved personalization by merging low-rank adapters (LoRA) using optimization-based methods, but this approach is computationally expensive and unsuitable for real-time use on resource-constrained devices such as smartphones. To address this issue, this paper proposes a LoRA$.$rar method that improves image quality while accelerating the merging process by over 4,000x. By pretraining a hypernetwork on diverse content-style LoRA pairs, we learn an efficient merging strategy that generalizes to new content-style pairs, enabling fast, high-quality personalization. Furthermore, we identify the limitations of existing content-style quality assessment metrics and propose a novel protocol that utilizes a multimodal large-scale language model (MLLM) for more accurate assessment. MLLM and human evaluations demonstrate that our method outperforms the state-of-the-art in both content and style fidelity.

Takeaways, Limitations

Takeaways:
The LoRA$.$rar method enables personalized image generation more than 4000 times faster than existing optimization-based LoRA merging methods.
We present an efficient LoRA merging strategy that is generalizable to various content-style combinations.
We propose a novel content-style quality assessment protocol utilizing MLLM.
Improve image quality and creation speed simultaneously.
Limitations:
Further research is needed to determine the generality and objectivity of the proposed MLLM-based assessment protocol.
The performance of the LoRA$.$rar method may depend on the performance of the pre-trained hypernetwork.
Further validation is needed for compatibility with various image generation models.
👍