Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Multilingual Collaborative Defense for Large Language Models

Created by
  • Haebom

Author

Hongliang Li, Jinan Xu, Gengping Cui, Changhao Guan, Fengran Mo, Kaiyu Huang

Outline

This paper explores a vulnerability in large-scale language models (LLMs): "jailbreak" attacks, which bypass LLM security measures by translating malicious questions into rare or underrepresented languages. We highlight the lack of prior research on LLM security in multilingual environments and propose a novel learning method, Multilingual Collaborative Defense (MCD). MCD automatically optimizes continuous and soft safety prompts to enhance multilingual LLM security. It offers three key advantages: enhanced security performance in multilingual environments, robust generalization, and low rejection rates, while mitigating the security inconsistencies caused by imbalanced LLM training corpora. We evaluate the effectiveness and transferability of MCD by adapting existing benchmarks such as MaliciousInstruct and AdvBench to include underrepresented languages, demonstrating that it outperforms existing methods. The code is available on GitHub.

Takeaways, Limitations

Takeaways:
A new method (MCD) to improve the security of LLM in a multilingual environment is presented.
Demonstrated effective defense against multilingual jailbreak attacks
Low rejection rate and strong generalization ability
Addressing the issue of language safety inconsistencies due to LLM training data imbalances.
Multilingual Jailbreak Attack Benchmark Dataset Released
Limitations:
Further review is needed of the scale and diversity of the proposed multilingual benchmarks.
Further validation of the generalizability across various real-world jailbreak attack scenarios is needed.
Further analysis of the computational cost and efficiency of MCD is needed.
Continuous monitoring of new jailbreak techniques and research on the adaptability of MCD are necessary.
👍