This paper proposes ZeroTuning, a novel training-free method that improves LLM performance by lightly biasing the initial tokens to overcome the limitations of token-level attention tuning (e.g., Post-hoc Attention Steering (PASTA) and Attention Calibration (ACT)). We theoretically demonstrate that biasing the initial tokens modulates the entropy of the downstream attention distribution, particularly in the early layers, and exhibits different scaling preferences across attention heads. ZeroTuning works by applying head-specific attention tuning to the initial tokens to minimize the model's output entropy and can be implemented with just four lines of modification to the LlamaAttention code. We present two variants (supervised and unsupervised) and demonstrate superior performance to existing methods on 15 datasets. Using the Llama-3.1-8B model, we achieve relative performance gains of 19.9% on classification tasks, 4.5% on question answering tasks, and 2.1% on conversational tasks, while maintaining performance even under quantized inference and long context lengths.