In this paper, we present Activation-Steered Compression (ASC), a novel method to address the problem of excessive chains of thought (CoTs) in the inference process of large-scale language models (LLMs). ASC compresses the inference process by extracting and injecting a “steering vector” by exploiting the difference between concise mathematical inference and verbose English-based inference in the activation space of the model. This is a technique that directly modifies the hidden representation at inference time without retraining to shorten the CoT length. Through theoretical analysis using KL-divergence-bounded constraints, we show that it adjusts the steering strength and achieves up to 67.43% CoT length reduction on MATH500 and GSM8K datasets while maintaining accuracy. In particular, it achieves an average speedup of 2.73x on the 8B model, suggesting that it is a practical and efficient tool for LLM deployments with inference capabilities in latency- and cost-sensitive environments.