In this paper, we propose a differentiated reward method based on a steady-state transition system to solve the problem of sample efficiency degradation in multi-vehicle cooperative driving strategy optimization using reinforcement learning (RL). By integrating the state transition gradient information into the reward design through traffic flow characteristic analysis, we optimize action selection and policy learning in multi-vehicle cooperative decision-making. The performance of the proposed method is verified using RL algorithms such as MAPPO, MADQN, and QMIX and various autonomous vehicle ratio environments. As a result, the learning convergence speed is significantly improved, and it outperforms the existing central reward methods in terms of traffic efficiency, safety, and behavioral rationality. In addition, it shows strong scalability and environmental adaptability, suggesting a new approach for multi-agent cooperative decision-making in complex traffic environments.