This paper studies the offline maximum entropy regularization inverse reinforcement learning (offline MaxEnt-IRL) problem in machine learning, specifically the dynamic discrete choice (DDC) model. The goal is to recover the reward or Q function that governs agent behavior from offline behavioral data. We propose a globally convergent gradient-based method to solve this problem without the restrictive assumption of linearly parameterized rewards. The novelty of this study lies in introducing an empirical risk minimization (ERM)-based IRL/DDC framework that avoids the need for explicit state transition probability estimation in the Bellman equation. Furthermore, the proposed method is compatible with nonparametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to extend to high-dimensional, infinite state spaces. The key theoretical insight of this study is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition, which is weaker than strong convexity but sufficient to guarantee fast global convergence. A series of synthetic experiments demonstrate that the proposed method consistently outperforms benchmark methods and state-of-the-art alternatives.