Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Created by
  • Haebom

Author

Jingze Zhu, Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yanqiang Zheng, Jiawei Chen, Xu Yang, Bernt Schiele, Jonas Fischer, Xinting Hu

Outline

Large-scale language models (LLMs) excel at understanding and generating natural language, but their vulnerability to factual errors limits their reliability in knowledge-intensive tasks. While decode-time strategies offer an efficient solution without training, existing methods process token- and layer-level signals separately, overlooking their joint dynamics. In this study, we present a token-aware, layer-localized contrastive decoding method that improves factual generation by aligning specific token types with their most influential transformer layers. Empirical attention analysis identifies two key patterns: punctuation tokens receive dominant attention in early layers, while conceptual tokens dominate semantic inference in intermediate layers. By selectively suppressing attention to these token types at this depth, we achieve controlled factual degradation and derive contrastive signals that guide final factual decoding. Our method requires no additional training or model modification, and we demonstrate through experiments that it consistently improves factuality across multiple LLMs and various benchmarks.

Takeaways, Limitations

Takeaways:
We present a novel approach to address the realism problem of LLM by considering the joint dynamics between token-level and layer-level signals.
Improves realism performance across a variety of LLMs without additional training or model modification.
We analyze the attention patterns of punctuation tokens and conceptual tokens and utilize them in designing a methodology.
We present an innovative method to induce contrast signals by controlling the factual degradation.
Limitations:
Because it relies on attentional pattern analysis for specific token types (punctuation, conceptual tokens), generalization to other types of tokens or model structures may be limited.
The performance of the methodology may be limited to specific LLMs and benchmarks, and its applicability to various domains needs to be further verified.
Further analysis is needed to determine the impact of attentional inhibition mechanisms on other abilities of LLM (e.g., fluency, creativity).
👍