Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

Created by
  • Haebom

Author

Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao

Outline

Despite significant progress in tuning Large-Scale Language Models (LLMs) to human-like values, current safety mechanisms remain vulnerable to jailbreak attacks. This paper hypothesizes that this vulnerability stems from a distributional mismatch between alignment-oriented prompts and malicious prompts. To investigate this, we introduce LogiBreak, a novel, general-purpose black-box jailbreak method that exploits logic representation transformation to bypass LLM safety systems. LogiBreak exploits the distributional gap between alignment-oriented data and logic-based input by transforming malicious natural language prompts into formal logic representations, thereby bypassing safety constraints while preserving the underlying semantic intent and readability. We evaluate LogiBreak on a multilingual jailbreak dataset spanning three languages, demonstrating its effectiveness across a variety of evaluation settings and linguistic contexts.

Takeaways, Limitations

Takeaways:
LogiBreak presents a new jailbreak methodology that bypasses the LLM security system.
Jailbreaking through logical expression transformation overcomes language barriers and is effective in various language environments.
Exploiting the distributional differences between LLM sorted data and malicious input exposes vulnerabilities in security systems.
Limitations:
The specific Limitations of this paper is not specified in the abstract (e.g., performance on a specific model, jailbreak success rate, etc.).
A detailed analysis of LogiBreak's real-world applicability and defense strategies is needed.
You can find more detailed information by checking the full paper Limitations.
👍