This paper proposes a process compensation model that provides step-by-step feedback to address the problem of supervising the validity of intermediate-level inference in models that utilize multi-step inference strategies. Existing process compensation models lack explanations and rely on supervised learning using static datasets, resulting in limited generalization (T17685). In this paper, we reframe step-by-step compensation modeling as an inference task rather than a classification task, and propose a generative judge that infers the inference steps of a policy model. The proposed model, StepWiser, is trained using reinforcement learning using the relative outcomes of rollouts, and demonstrates improved intermediate-level judgment accuracy, improved policy modeling during training, and improved inference-time search compared to existing methods.