Inverse reinforcement learning (IRL) learns a reward function to account for expert demonstrations. Modern IRL methods often use an adversarial (min-max) formulation that alternates between reward and policy optimization, but this often leads to unstable learning. Recent non-adversarial IRL approaches have improved stability by jointly learning rewards and policies using energy-based formulations, but lack formal guarantees. This study addresses this gap. First, we present a unified view that standard non-adversarial methods explicitly or implicitly maximize the likelihood of expert actions, which is equivalent to minimizing the expected return difference. This insight leads to our main contribution: Trust Region Reward Optimization (TRRO), a framework that guarantees monotonic improvement of this likelihood through a minimization-maximization process. We implement TRRO as Proximal Inverse Reward Optimization (PIRO), a practical and stable IRL algorithm. Theoretically, TRRO provides an IRL counterpart to the stability guarantees of Trust Region Policy Optimization (TRPO) in forward RL. Empirically, PIRO matches or outperforms state-of-the-art baselines in reward recovery, policy imitation, and high sample efficiency on MuJoCo and Gym-Robotics benchmarks, as well as on real-world animal behavior modeling tasks.