Models for long-term video generation face a fundamental long-term context memory problem: maintaining and retrieving long-term contexts. The application of diffusion transformers to long-term context video generation is limited by the quadratic cost of self-attention. To address this issue, this paper treats it as an internal information retrieval task and proposes a simple, learnable, sparse attention routing module called Mixture of Contexts (MoC), an effective long-term memory search engine. In MoC, each query dynamically selects a few information-rich chunks and essential anchors (captions, local windows) to focus attention on. This utilizes causal routing to avoid loop closures. By increasing data size and progressively sparsifying routing, the model allocates computation to key records, preserving identities, actions, and scenes across several minutes of content. This search-based approach achieves efficiency (nearly linear scaling), enables practical learning and synthesis, and exhibits memory and consistency at the minute-scale.