Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics

Created by
  • Haebom

Author

Shravan Nayak, Mehar Bhatia, Xiaofeng Zhang, Verena Rieser, Lisa Anne Hendricks, Sjoerd van Steenkiste, Yash Goyal, Karolina Sta nczak, Aishwarya Agrawal

Outline

This paper addresses concerns about the ability of text-to-image (T2I) models to accurately represent diverse cultural contexts and presents the first study to systematically quantify the consistency of explicit and implicit cultural expectations with T2I models and evaluation metrics. To this end, we introduce CulturalFrames, a novel benchmark spanning ten countries and five sociocultural domains. CulturalFrames comprises 983 prompts, 3,637 images generated by four state-of-the-art T2I models, and over 10,000 detailed human annotations. Our results reveal that cultural expectations are misfulfilled on average 44% of the time across models and countries. A surprisingly high 68% of explicit expectations are misfulfilled, and 49% of implicit expectations are misfulfilled. Furthermore, existing T2I evaluation metrics, regardless of their underlying inference methods, show low correlations with human judgments of cultural consistency. In conclusion, this study reveals important gaps, provides concrete testing environments, and suggests actionable directions for developing culturally sensitive T2I models and metrics that improve global usability.

Takeaways, Limitations

Takeaways:
We present CulturalFrames, a new benchmark for quantitatively measuring and analyzing cultural bias issues in the T2I model.
The T2I model revealed a high rate of failure to meet cultural expectations (68% explicit, 49% implicit, 44% overall).
Shows that existing evaluation metrics do not adequately assess cultural fit.
Emphasizes the need to develop culturally sensitive T2I models and evaluation metrics.
Limitations:
CulturalFrames benchmarks are focused on specific countries and socio-cultural areas, which may limit their generalizability.
Subjectivity in human annotations may influence results.
Although we have covered a variety of T2I models, we may not be able to cover all models.
Interpretation of implicit cultural expectations can be ambiguous.
👍