To improve the inference performance of the Transformer LLM, we propose a "bottleneck transformer" architecture that rewrites the memory (KV) cache during the inference process. This architecture mimics the brain's memory (re)consolidation process and, based on information bottleneck theory, aims to compress the KV cache and retain important information to improve generalization performance. The proposed architecture uses a secondary transformer, the Cache Processor, to integrate new KV entries and selectively reintegrate some of the past entries. In mathematical inference benchmarks, it consistently outperforms existing Transformer and pause-token-based models.