Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

"Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs

Created by
  • Haebom

Author

Darpan Aswal, Siddharth D Jaiswal

Outline

This paper addresses the vulnerabilities of recently released large-scale language models (LLMs) in their multilingual and multimodal capabilities. Unlike previous studies that primarily focused on English, this paper presents novel strategies for bypassing LLMs in text and image generation tasks, utilizing code mixing and phonetic transformations. We present two novel bypass strategies and demonstrate their effectiveness over existing methods. By applying phonetic misspellings to sensitive words in code mixing prompts, we effectively bypass LLM safety filters while maintaining interpretability. We achieve 99% attack success rates for text generation and 78% for image generation, and 100% attack relevance rates for text generation and 95% for image generation. Experiments demonstrate that phonetic transformations influence word tokenization, leading to successful attacks. Considering the potential for prompts containing misspellings in real-world settings, we highlight the need for research on generalized safety alignment of multilingual multimodal models. This paper includes examples of potentially harmful and objectionable content.

Takeaways, Limitations

Takeaways:
A novel LLM bypass strategy utilizing code mixing and phonetic transformation is presented, achieving a high success rate.
Clarifying LLM security vulnerabilities in multilingual and multimodal environments.
To identify the cause of successful bypass by analyzing the impact of phonetic transformation on the tokenization process of LLM.
Emphasizes the need for research on more generalized safety alignment of multilingual, multimodal models.
Limitations:
Limitations in generalizability due to the characteristics of the dataset and model used in the study.
The exploitability of the proposed bypass strategy.
The possibility that it may not perfectly reflect various situations in the real world.
👍