Achieving safe and coordinated behavior in dynamic and constrained environments is a key challenge for learning-based control. This paper proposes a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) and low-level execution via model predictive control (MPC). For multi-agent systems, this means that a high-level policy selects an abstract goal from a structured region of interest (ROI), and MPC ensures dynamically feasible and safe movements. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL-based models in terms of reward, safety, and consistency, highlighting the benefits of combining structured learning and model-based control.