This paper studies the Safety Policy Improvement (SPI) problem, an offline reinforcement learning problem that computes a new policy that reliably outperforms existing behavioral policies using only a dataset and behavioral policies. To improve the data efficiency of existing MDP-based SPI, we propose three techniques that exploit parameter dependencies between distributions within transition dynamics. First, we present a parametric SPI algorithm that utilizes known correlations to more accurately estimate transition dynamics. Second, we present a preprocessing technique that removes redundant actions using game-based abstraction. Third, we present an advanced preprocessing technique based on the Satisfiability Modulo Theory (SMT) solution that identifies more redundant actions. Experimental results demonstrate that the proposed techniques improve the data efficiency of SPI by orders of magnitude while maintaining reliability.