Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety

Created by
  • Haebom

Author

Taegyeong Lee, Jeonghwa Yoo, Hyoungseo Cho, Soo Yong Kim, Yunho Maeng

QGuard: Question Prompting for LLM Safety

Outline

Advances in large-scale language models (LLMs) have impacted various fields, but have also increased the potential for malicious users to exploit harmful or jailbreak prompts. This paper proposes QGuard, a simple and effective security guard method that leverages question prompts to block harmful prompts. QGuard defends against both text-based and multimodal harmful prompt attacks, and is robust to modern harmful prompts without fine-tuning. Experimental results demonstrate that QGuard performs competitively on text-based and multimodal harmful datasets. Furthermore, question prompting analysis enables white-box analysis of user input.

Takeaways, Limitations

Takeaways:
A simple and effective way to block harmful prompts in a zero-shot manner.
Defends against both text and multimodal attacks.
Robust against the latest harmful prompts without fine tuning.
Providing white box analysis through question prompting analysis.
Provides insights into mitigating security risks in LLM services.
Limitations:
There is no Limitations specified in the paper.
👍