This paper proposes a novel KV cache update technique, RetroAttention, to address the slowdown in inference of large-scale language models (LLMs) in long-text tasks (e.g., inference, code generation, and multi-turn dialogues). Unlike existing KV cache compression methods that primarily focus on input context, RetroAttention addresses accumulated attention errors by updating past attention outputs using newly arrived KV entries during subsequent decoding passes. Maintaining a lightweight output cache allows past queries to efficiently access more relevant contexts while incurring minimal latency overhead. Consequently, it breaks the fixed attention output paradigm and allows for continuous updating of previous approximations. Extensive experiments on long-text generation benchmarks demonstrate that RetroAttention consistently outperforms state-of-the-art (SOTA) KV compression methods, improving effective KV exposure by up to 1.6x and accuracy by up to 21.9%.