This paper proposes a novel pruning technique, COMPACT, to improve the efficiency of large-scale language models (LLMs). COMPACT (i) shrinks the embedding/LM head layer by removing rare words, and (ii) prunes the intermediate channels of a feed-forward network (FFN) using common token-weighted activations. This approach aims to reduce memory usage, latency, and cost while maintaining the standard transformer architecture. Experimental results on Qwen, LLaMA, and Gemma models (0.5B-70B) demonstrate that COMPACT significantly reduces the number of parameters, GPU memory, and latency while maintaining state-of-the-art performance.