This paper presents Block, a distributed scheduling framework that leverages contextual information about incoming requests to optimize load balancing and automatic provisioning across instances in a large-scale language model serving framework. Unlike existing model serving systems that rely on monolithic, heuristic task schedulers, Block operates as a fully distributed, stateless, and predictive scheduling system, resulting in low overhead, reliability, and scalability. It leverages the deterministic and predictable properties of LLM inference, such as host configuration, response length, and hardware performance, to make scheduling decisions based on accurately predicted metrics. Evaluation results on a 12-GPU cluster demonstrate that Block significantly outperforms heuristic schedulers, increasing serving capacity by up to 16.7% and reducing P99 latency by up to 49.5%. These performance gains are consistent across a variety of models, workloads, and configurations. The code and data are open source.