In this paper, we propose a continuous pretraining method on small task-specific data to address the catastrophic forgetting problem that arises in continuous pretraining of large-scale language models to new target domains. While existing domain reweighting strategies rely on manually specifying heuristics based on human intuition or empirical results, we propose the first model-based end-to-end framework, Data Mixing Agent, to parameterize more general heuristics. Data Mixing Agent learns generalizable heuristics using reinforcement learning from a large number of data mixing paths and feedback from evaluation environments. In continuous pretraining experiments on mathematical inference, Data Mixing Agent outperforms strong baseline models in achieving balanced performance on source and target field benchmarks. It also generalizes well to unseen source field, target model, and domain spaces without retraining. Direct applications to code generation demonstrate its adaptability to target domains. Further analysis demonstrates that the agent’s heuristics are well aligned with human intuition and are efficient in achieving good model performance with a small amount of source field data.