Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing

Created by
  • Haebom

Author

Fuqing Bie, Shiyu Huang, Xijia Tao, Zhiqin Fang, Leyi Pan, Junzhe Chen, Min Ren, Liuyu Xiang, Zhaofeng He

Outline

OmniPlay is a new benchmark for evaluating the intelligence of interactive agent models leveraging diverse sensory information. To overcome the limitations of existing benchmarks, it integrates various modalities, including visual, auditory, and temporal information, and provides an interactive game environment. Comprised of five game environments, it creates interactions and conflicts between modalities to assess the agent's cross-modal reasoning abilities. Evaluating six leading multimodal models revealed superhuman performance on high-resolution memory tasks but significant failures on tasks requiring robust reasoning and strategic planning. This vulnerability stems from a brittle fusion mechanism, which exhibits a rapid performance degradation when modalities conflict. Furthermore, it discovered the "less is more" paradox, where removing sensory information paradoxically improves performance. Therefore, research on robust AGI requires more than simple scaling; it must provide a clear solution for synergistic fusion.

Takeaways, Limitations

Takeaways:
We present OmniPlay, a new benchmark for evaluating the intelligence of agent models that integrate and interact with diverse modalities.
We reveal the vulnerabilities of existing multimodal models (lack of robust inference and strategic planning) and their causes (brittle fusion mechanisms).
The discovery of the "less is more" paradox highlights the importance and challenges of modality integration.
This suggests that research on synergy fusion beyond simple scalability is necessary for the development of robust AGI.
Limitations:
Further research is needed to determine the generalizability of the OmniPlay benchmark.
Limits on the type and number of models used in evaluation.
More comprehensive research on different types of modality conflicts and interactions is needed.
👍