Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

What Drives Compositional Generalization in Visual Generative Models?

Created by
  • Haebom

Author

Karim Farid, Rajat Sahay, Yumna Ali Alnaggar, Simon Schrodi, Volker Fischer, Cordelia Schmid, Thomas Brox

Outline

This study systematically investigates factors that enhance constructive generalization in visual generative models. Specifically, we experimentally investigated various design choices that positively or negatively impact constructive generalization in image and video generation models. Our key findings reveal that whether the training objective is discrete or continuous and the degree to which conditional information about component concepts is provided significantly influence constructive generalization. Furthermore, we suggest that constructive performance can be improved in discrete models like MaskGIT by mitigating the discrete loss of MaskGIT with an auxiliary JEPA-based continuous objective.

Takeaways, Limitations

Takeaways:
Uncovering key factors for constructive generalization of visual generative models.
Suggests the importance of discrete/continuous training objectives.
We present a new method to improve the performance of models such as MaskGIT.
Limitations:
Further research is needed to determine generalizability to specific models and datasets.
Further analysis is needed to determine why JEPA-based auxiliary targets contribute to improved performance.
Lack of exploration of other factors influencing constructive generalization.
👍