ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training