[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

Created by
  • Haebom

Author

Wenliang Shan, Michael Fu, Rui Yang, Chakkrit Tantithamthavorn

Outline

In this paper, we propose SEALGuard, a multilingual safeguard, to address the limitations of existing large-scale language model (LLM) safeguards (e.g., LlamaGuard) that lack multilingual support. SEALGuard is developed using SEALSBench, a large-scale multilingual safety aligned dataset consisting of 10 languages including low-resource languages. Using the Low-Rank Adaptation (LoRA) technique, we apply a general multilingual language model as a multilingual safeguard, and through comparative evaluation with LlamaGuard, we confirm that it shows superior performance against multilingual malicious prompts and jailbreak attempts. In particular, unlike LlamaGuard, which deteriorates in malicious prompts and jailbreak attempts for languages other than English (9% and 18% decrease in DSR), SEALGuard achieves superior performance in DSR, precision, and F1-score (48% increase in DSR compared to LlamaGuard). In addition, we analyze the performance contributions of adaptation strategies and model size through ablation studies.

Takeaways, Limitations

Takeaways:
Introducing SEALGuard, a multilingual safety device that effectively solves the problem of lack of multilingual support of existing LLM safety devices
Laying the foundation for multilingual LLM safety research by building a multilingual safety alignment dataset SEALSBench
Presenting a method to build an efficient multilingual safety device using LoRA
Contributing to improving LLM security in a multilingual environment
Limitations:
Need to expand the diversity of languages and prompt types in the SEALSBench dataset
Need for generalization performance validation for new types of malicious prompts and jailbreak attempts
Need for additional evaluation of performance and efficiency when applied to actual LLM systems
👍