CARMA is a system that provides contextual grounding in human-robot group interaction. Effective group collaboration requires contextual awareness based on consistent representations of current people and objects and episodic abstractions of actors and manipulated objects. This requires explicit and consistent assignment of instances so that robots can accurately recognize and track actors, objects, and interactions over time. CARMA uniquely identifies physical instances of these entities in the real world and organizes them into grounded triplets of actors, objects, and actions. We evaluate its role separation, multi-actor recognition, and consistent instance identification capabilities through experiments on collaborative following, object handing, and classification tasks. Our experimental results demonstrate that the system reliably generates accurate actor-action-object triplets, providing a structured and robust foundation for applications in collaborative environments that require spatial-temporal reasoning and contextual decision making.