This paper addresses a core challenge in artificial intelligence: effective policy learning to control agents in unknown environments and optimize performance metrics. Off-policy learning methods, such as Q-learning, allow learners to make optimal decisions based on past experience. This paper studies off-policy learning from biased data in complex, high-dimensional domains where unobserved confounding variables cannot be excluded in advance. Based on the well-known Deep Q-Network (DQN), we propose a novel deep reinforcement learning algorithm that is robust to observed data confounding biases. Specifically, the algorithm attempts to find a safe policy for the worst-case environment that is compatible with observations. We apply the proposed method to twelve perturbed Atari games and demonstrate that the proposed method consistently outperforms the standard DQN in all games where observed inputs to the action and goal policies are inconsistent and unobserved confounding variables are present.