Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

Created by
  • Haebom

Author

Joerg Deigmoeller, Stephan Hasler, Nakul Agarwal, Daniel Tanneberg, Anna Belardinelli, Reza Ghoddoosian, Chao Wang, Felix Ocker, Fan Zhang, Behzad Dariush, Michael Gienger

Outline

CARMA is a system that provides contextual grounding in human-robot group interaction. Effective group collaboration requires contextual awareness based on consistent representations of current people and objects and episodic abstractions of actors and manipulated objects. This requires explicit and consistent assignment of instances so that robots can accurately recognize and track actors, objects, and interactions over time. CARMA uniquely identifies physical instances of these entities in the real world and organizes them into grounded triplets of actors, objects, and actions. We evaluate its role separation, multi-actor recognition, and consistent instance identification capabilities through experiments on collaborative following, object handing, and classification tasks. Our experimental results demonstrate that the system reliably generates accurate actor-action-object triplets, providing a structured and robust foundation for applications in collaborative environments that require spatial-temporal reasoning and contextual decision making.

Takeaways, Limitations

Takeaways:
We present an effective system for situation awareness in human-robot group interaction.
It uniquely identifies physical instances in the real world and structures situations by composing them into triplets of actors, objects, and actions.
We experimentally verified the accuracy and stability of the system in collaborative tasks (following, handing over objects, and sorting).
Provides a robust foundation for collaborative applications that require spatiotemporal reasoning and contextual decision making.
Limitations:
The paper does not mention the Limitations system or future research directions.
Limitations of the experimental environment may limit generalization to a variety of real-world situations.
Experiments involving more complex and diverse interactions are needed to more broadly evaluate the performance of the system.
👍