Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows

Created by
  • Haebom

Author

Thiago Garrett, Weijia Song, Roman Vitenberg, Ken Birman

Outline

This paper focuses on reducing latency in AI inference workflows, which consist of pipelines or graphs of event-triggered AI programs. Standard techniques for reducing latency in streaming settings, such as caching or optimization-based scheduling, are limited in their effectiveness because AI data access patterns (models, databases) vary depending on triggering events. In this paper, we propose a novel affinity grouping mechanism that allows developers to more easily express application-specific data access relationships, enabling coordinated management of data objects across server clusters hosting streaming inference jobs. This mechanism complements other approaches, such as caching and scheduling. Experimental results confirm the limitations of standard techniques and demonstrate that the proposed mechanism maintains significantly lower latency with minimal code changes as workloads and scalability increase.

Takeaways, Limitations

Takeaways:
A novel affinity grouping mechanism is presented to effectively reduce the latency of AI inference workflows.
Overcomes limitations of existing streaming techniques and considers application-specific data access correlations.
Demonstrates the possibility of achieving performance improvements with minimal code changes.
Confirming the complementary effects with existing techniques such as caching and scheduling.
Limitations:
Further research is needed to determine the generalizability of the proposed mechanism in practical application environments.
Extensive experimentation is needed across different types of AI inference tasks and data access patterns.
Further research is needed on applying mechanisms and evaluating their performance in complex AI workflows.
👍