Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

Created by
  • Haebom

Author

Bodam Kim, Hiskias Dingeto, Taeyoun Kwon, Dasol Choi, Donggeon Lee, Haon Park, Jaehoon Lee, Jongho Shin

Outline

This paper presents a novel vulnerability in audio-based interactions with large-scale language models (LLMs) and introduces WhisperInject, a novel attack framework that exploits it. WhisperInject manipulates state-of-the-art audio LLMs using subtle, human-imperceptible audio perturbations to generate malicious content. The two-stage framework utilizes reinforcement learning and projected gradient descent (RL-PGD) in the first stage to bypass the model's safety protocols and generate malicious raw responses. In the second stage, projected gradient descent (PGD) is used to embed malicious responses into benign audio (e.g., weather questions, greetings, etc.). Targeting the Qwen2.5-Omni-3B, Qwen2.5-Omni-7B, and Phi-4-Multimodal models, we achieve a success rate of over 86% under rigorous safety evaluation frameworks including StrongREJECT, LlamaGuard, and human evaluation. This research presents a novel audio-based threat that goes beyond theoretical attacks and demonstrates a practical and stealthy AI manipulation method.

Takeaways, Limitations

Takeaways:
Uncovering new vulnerabilities in audio-based LLM interactions.
Presenting the WhisperInject framework, a practical and covert AI manipulation method.
Demonstrated high success rate under a robust safety assessment framework.
The need to strengthen audio-based LLM security is raised.
Limitations:
Currently limited to verifying the attack effectiveness for specific LLM models.
Further research is needed to determine the robustness of the attack against various audio environments and noise.
Further research is needed on WhisperInject defense techniques.
👍