Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

Created by
  • Haebom

Author

Matteo Bortoletto, Constantin Ruhdorfer, Andreas Bulling

Outline

Existing Theory of Mind (ToM) benchmarks rely on variations of the Sally-Anne test, providing only a very limited perspective on ToM and overlooking the complexity of human social interaction. In this paper, we propose a novel benchmark, ToM-SSI, specifically designed to test ToM abilities in environments rich in social interaction and spatial dynamics. While existing ToM benchmarks are limited to text-only or dyadic interactions, ToM-SSI is multimodal and encompasses group interactions of up to four agents moving in a context-sensitive and interactive environment. This unique design allows us to explore a mixed cooperative-interfering setting and parallel inference about the mental states of multiple agents for the first time, capturing a broader range of social cognition than existing benchmarks. Our evaluation reveals that the performance of the current model remains severely limited, especially on these novel tasks, highlighting important gaps for future research.

Takeaways, Limitations

Takeaways: By presenting the ToM-SSI, a new benchmark for assessing ToM abilities in environments rich in social interaction and spatial dynamics, we overcome the limitations of existing benchmarks and enable more comprehensive ToM research. It can assess a wide range of social cognitive abilities, including cooperative and disruptive interactions between up to four agents.
Limitations: The ToM-SSI benchmark revealed that the current model's performance is still very limited. This presents significant areas for improvement in future research. Furthermore, ToM-SSI may not fully reflect the full complexity of real-world human social interactions.
👍