Vision-based autonomous river tracking of unmanned aerial vehicles (UAVs) in environments with unreliable GPS signals is crucial for applications such as rescue, surveillance, and environmental monitoring. This safety-critical navigation task requires optimizing performance while meeting stringent safety constraints. The reward for river tracking varies across the visited river segment (non-Markov), posing challenges for standard SafeRL. To address this gap, we first introduce Marginal Gain Advantage Estimation (MGAE), which improves the reward advantage function using a sliding window baseline computed from past episode returns to accommodate non-Markov dynamics. Second, we develop a Semantic Dynamics Model (SDM) based on patched water semantic masks that provides more interpretable and data-efficient short-term future observation forecasts compared to latent visual dynamics models. Third, we present the Constrained Actor Dynamics Estimator (CADE) architecture, which integrates actors, a cost estimator, and the SDM to form a model-based SafeRL framework for cost advantage estimation.