Existing Theory of Mind (ToM) benchmarks rely on variations of the Sally-Anne test, providing only a very limited perspective on ToM and overlooking the complexity of human social interaction. In this paper, we propose a novel benchmark, ToM-SSI, specifically designed to test ToM abilities in environments rich in social interaction and spatial dynamics. While existing ToM benchmarks are limited to text-only or dyadic interactions, ToM-SSI is multimodal and encompasses group interactions of up to four agents moving in a context-sensitive and interactive environment. This unique design allows us to explore a mixed cooperative-interfering setting and parallel inference about the mental states of multiple agents for the first time, capturing a broader range of social cognition than existing benchmarks. Our evaluation reveals that the performance of the current model remains severely limited, especially on these novel tasks, highlighting important gaps for future research.