Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning