OmniPlay is a new benchmark for evaluating the intelligence of interactive agent models leveraging diverse sensory information. To overcome the limitations of existing benchmarks, it integrates various modalities, including visual, auditory, and temporal information, and provides an interactive game environment. Comprised of five game environments, it creates interactions and conflicts between modalities to assess the agent's cross-modal reasoning abilities. Evaluating six leading multimodal models revealed superhuman performance on high-resolution memory tasks but significant failures on tasks requiring robust reasoning and strategic planning. This vulnerability stems from a brittle fusion mechanism, which exhibits a rapid performance degradation when modalities conflict. Furthermore, it discovered the "less is more" paradox, where removing sensory information paradoxically improves performance. Therefore, research on robust AGI requires more than simple scaling; it must provide a clear solution for synergistic fusion.