Mooncake is a serving platform for Kimi, the main LLM service provided by Moonshot AI. Mooncake features a KVCache-centric distributed architecture that separates prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of GPU clusters to implement a distributed KVCache cache. At the heart of Mooncake is a KVCache-centric scheduler that maximizes overall effective throughput while meeting latency-related service-level objectives (SLOs). Contrary to existing research that assumes all requests will be processed, Mooncake struggles under overload scenarios. To mitigate this, we developed a prediction-based early rejection policy. Experimental results show that Mooncake outperforms long-context scenarios. Compared to baseline methods, Mooncake can increase throughput by up to 525% in certain simulated scenarios while meeting SLOs. Under real-world workloads, Mooncake's innovative architecture enables Kimi to handle up to 75% more requests.