Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MMPB: It's Time for Multi-Modal Personalization

Created by
  • Haebom

Author

Jaeik Kim, Woojin Kim, Woohyeon Park, Jaeyoung Do

Outline

This paper focuses on visual personalization, a crucial area in user-centric AI systems. We introduce MMPB, the first comprehensive benchmark for evaluating the personalization capabilities of large-scale Vision-Language Models (VLMs). MMPB consists of 10,000 image-query pairs and 111 personalizable concepts across four categories (human, animal, object, and character), with preference-based queries included in the human category. We structure personalization into three main task types to evaluate the performance of 23 widely used VLMs and find that most VLMs struggle with personalization.

Takeaways, Limitations

Takeaways:
Development of MMPB, the first benchmark for practical evaluation of the personalization capabilities of VLM.
Personalization performance evaluation of various VLMs, including open and closed models.
We found that most VLMs struggle with personalization tasks, particularly maintaining conversational consistency, handling user preferences, and adapting visual cues.
Identify key challenges in VLM personalization and suggest future research directions.
Limitations:
There are limited specific methodologies for improving the personalized performance of VLM.
The results of this study may depend on the specific models and benchmark data used.
It may not cover all aspects of personalization of VLM.
👍