This paper presents a method that uses reinforcement learning to train a single $d$-dimensional steering vector per layer, while keeping the basis weights fixed. This method achieves performance comparable to that of a fully RL-tuned inference model on a mathematical reasoning task. The additional parameterization is only about 0.0016% of the 8 billion-parameter model, and the performance is reproducible across a variety of basis models and mathematical reasoning benchmarks. These results narrow the upper bound on the parameter budget required for high-dimensional thought chain inference, suggesting that millions of adapter weights are unnecessary. The minimal trainable space reduces communication between the optimizer memory and the GPU, lowering the overall cost of fine-tuning. Furthermore, logit-lens analysis demonstrates that the learned vectors amplify consistent token orientations, providing clear insight into the model's internal computation.