Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Cold-RL: Learning Cache Eviction with Offline Reinforcement Learning for NGINX

Created by
  • Haebom

Author

Aayush Gupta, Arpit Bhayani

Outline

This paper addresses the limitations of the existing Least-Recently-Used (LRU) cache replacement policy used in web proxies such as NGINX and proposes Cold-RL, a novel reinforcement learning-based replacement policy. Cold-RL utilizes an ONNX sidecar to implement a dual-ring Deep Q-Network (DQN) to make cache replacement decisions under strict time constraints of less than 500 microseconds. Six lightweight features are extracted from cache objects to select replacement targets: age, size, hit count, inter-arrival time, remaining TTL, and last original RTT. Training is performed in a simulation environment by replaying NGINX access logs. Experimental results show that Cold-RL outperforms existing LRU, LFU, size-based, adaptive LRU, and hybrid techniques across various cache sizes. Specifically, it demonstrates a 146% improvement at small cache sizes (25 MB) and similar performance to existing techniques at large cache sizes (400 MB). The CPU overhead during the inference process is less than 2%, and the 95th percentile latency is maintained within 500 microseconds.

Takeaways, Limitations

Takeaways:
We demonstrate that reinforcement learning can be used to improve the cache replacement policy of web proxies.
We demonstrate that an effective reinforcement learning-based cache replacement policy can be implemented even under strict performance constraints (500 microseconds).
Achieving remarkable performance improvements over existing techniques at small cache sizes.
We present the first reinforcement learning-based cache replacement policy that can be integrated into real NGINX.
Limitations:
It relies on offline learning, and its adaptability to real-time environmental changes may require further research.
Since these are performance evaluation results for a specific workload, further verification of generalizability is required.
Since we only use 6 features, using more features may improve performance further.
Additional testing in various real-world environments is needed.
👍