Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Created by
  • Haebom

Author

Junzhe Li, Xuerui Qiu, Linrui Xu, Liya Guo, Delin Qu, Tingting Long, Chun Fan, Ming Li

Outline

UniF$^2$ace is the first unified multimodal model (UMM) specialized in understanding and generating fine-grained facial features. Unlike previous studies that mainly focus on coarse-grained understanding of facial features, UniF$^2$ace is designed to handle fine-grained facial features and generate them. We build a large-scale face dataset, UniF$^2$ace-130K, which consists of 130,000 image-text pairs and 1 million question-answer pairs, and train the model using two diffusion techniques and a two-stage expert mixture architecture. Through this, we establish theoretical connections between discrete diffusion score matching and mask generation models, and simultaneously optimize the lower bound of evidence to improve the ability to synthesize facial details. By introducing token-level and sequence-level expert mixtures, we enable efficient fine-grained representation learning for both understanding and generation tasks. Extensive experiments on UniF$^2$ace-130K show that UniF$^2$ace outperforms existing UMMs and generation models.

Takeaways, Limitations

Takeaways:
We present UniF$^2$ace, a novel UMM for understanding and generating fine facial features.
Achieve better performance than existing UMMs and generative models.
Performance improvement by exploiting theoretical connections between discrete diffusion score matching and mask generation models.
Building a large-scale face dataset UniF$^2$ace-130K.
Efficient fine-grained representation learning via token- and sequence-level expert mixture architectures.
Limitations:
Further validation of the bias and generalization performance on the UniF$^2$ace-130K dataset is needed.
Lack of detailed description of metrics and settings used in comparative experiments with other UMMs.
Lack of analysis of the model's computational cost and training time.
Lack of performance evaluation in real-world applications.
👍