Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Speculative Safety-Aware Decoding

Created by
  • Haebom

Author

Xuekang Wang, Shengyu Zhu, Xueqi Cheng

Speculative Safety-Aware Decoding (SSD)

Outline

Despite efforts to align Large Language Models (LLMs) with human values and safety rules, jailbreak attacks that exploit vulnerabilities persist. To defend against these attacks, this paper proposes Speculative Safety-Aware Decoding (SSD), a lightweight decode-time approach that enhances additional safety properties. SSD leverages small language models with safety properties and accelerates inference. It integrates speculative sampling into the decoding process and quantifies jailbreak risk using the agreement ratio between the small and composite models. This allows SSD to dynamically switch decoding strategies to prioritize utility or safety, while also addressing the issue of different model capacities. Output tokens are sampled from a new distribution that combines the distributions of the original and small models.

Takeaways, Limitations

Takeaways:
Successfully grants the desired safety properties to LLM.
Maintains model usability even for innocuous queries.
Accelerate inference time through speculative sampling design.
Limitations:
Securing the safety properties of small models must come first.
Performance may vary depending on the matching ratio setting between the small and large models.
👍