In this paper, we study the offline maximum entropy regularization inverse reinforcement learning (offline MaxEnt-IRL) problem in machine learning, namely the dynamic discrete choice (DDC) model. The goal is to recover the reward or Q* function that governs the agent's behavior from offline behavior data. We propose a global convergence-based gradient descent method to solve this problem without the restrictive assumption of linearly parameterized rewards. The novelty of this study lies in introducing an empirical risk minimization (ERM)-based IRL/DDC framework that does not require explicit state transition probability estimation in the Bellman equation. It is also compatible with nonparametric estimation techniques such as neural networks. Thus, the proposed method has the potential to be extended to high-dimensional, infinite state spaces. The key theoretical insight of this study is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition, which is weaker than the strong convexity but sufficient to guarantee fast global convergence. Through a series of synthetic experiments, we show that the proposed method consistently outperforms benchmark methods and state-of-the-art alternatives.