This paper addresses the problem of determining optimal data mixing ratios for large-scale language model training. Instead of conventional heuristic search methods, we approach data mixing ratio selection as a black-box hyperparameter optimization problem, leveraging Bayesian optimization. We systematically explore ways to transfer mixing ratios learned in small-scale experiments to large-scale ones, and utilize multi-fidelity Bayesian optimization to balance experimental cost and model fitness. We conduct pretraining and directive fine-tuning experiments across a range of models and benchmarks, ranging from 1 million to 7 billion parameters, demonstrating speedups of up to 500% compared to existing methods. Furthermore, we make the ADMIRE IFT Runs dataset, containing 460 full training and evaluation runs across various model sizes, available to facilitate research.