This paper argues that there are unique opportunities for applying deep reinforcement learning (DRL) to inventory management. To this end, we present and experimentally validate two complementary techniques: Hindsight Differentiable Policy Optimization (HDPO) and Graph Neural Networks (GNN). HDPO directly and efficiently optimizes policy performance by leveraging path-wise gradients from offline semi-empirical simulations. We demonstrate that HDPO is more robust than the REINFORCE algorithm and significantly outperforms the common Newsvendor heuristic on real-world time-series data. GNNs leverage natural inductive biases that encode supply chain structure, effectively reducing data requirements. Furthermore, we open-source the benchmark environment and codebase to address the lack of standardized benchmark problems that hinder progress in inventory management.