This paper focuses on reducing latency in AI inference workflows, which consist of pipelines or graphs of event-triggered AI programs. Standard techniques for reducing latency in streaming settings, such as caching or optimization-based scheduling, are limited in their effectiveness because AI data access patterns (models, databases) vary depending on triggering events. In this paper, we propose a novel affinity grouping mechanism that allows developers to more easily express application-specific data access relationships, enabling coordinated management of data objects across server clusters hosting streaming inference jobs. This mechanism complements other approaches, such as caching and scheduling. Experimental results confirm the limitations of standard techniques and demonstrate that the proposed mechanism maintains significantly lower latency with minimal code changes as workloads and scalability increase.