This paper introduces Reliable Policy Iteration (RPI) and Conservative RPI (CRPI), variants of Policy Iteration (PI) and Conservative Policy Iteration (CPI), which maintain tabular guarantees under function approximation. RPI uses a novel Bellman constraint optimization for policy evaluation, restoring textbook-like monotonicity in value estimation and guaranteeing lower bounds on true returns. CRPI shares RPI's evaluation method but conservatively updates policies by maximizing a new lower bound on performance difference that explicitly accounts for errors due to function approximation. CRPI inherits RPI's guarantees and allows for stepwise improvement bounds. In initial simulations, RPI and CRPI outperform PI and its variants. This study addresses the fundamental problem that widely used algorithms such as TRPO and PPO, which are derived from tabular CPI, fail to meet CPI's guarantees under function approximation, leading to divergence, oscillation, or convergence to suboptimal policies.