To address the high computational cost of excessive token generation in large-scale inference models, this paper proposes LessIsMore, a novel sparse attention mechanism that requires no training. Instead of relying on traditional head-specific local optimization, LessIsMore leverages global attention patterns to integrate token selection across each attention head and combines it with recent contextual information to generate a unified cross-head token ranking for future decoding layers. This eliminates the need to maintain separate token subsets for each head, improving generalization performance and efficiency. Evaluations on various inference tasks and benchmarks demonstrate that LessIsMore achieves an average 1.1x decoding speedup compared to full attention while maintaining or improving accuracy. Furthermore, by focusing attention on twice as many tokens without compromising accuracy, LessIsMore achieves a 1.13x end-to-end speedup compared to existing sparse attention methods.