Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration

Created by
  • Haebom

Author

Tatia Tsmindashvili, Ana Kolkhidashvili, Dachi Kurtskhalia, Nino Maghlakelidze, Elene Mekvabishvili, Guram Dentoshvili, Orkhan Shamilov, Zaal Gachechiladze, Steven Saporta, David Dachi Choladze

Outline

This paper presents a novel approach to addressing security vulnerabilities, particularly jailbreak and prompt injection, that arise when using large-scale language models (LLMs) in production environments. We highlight the limitations of existing fine-tuning and API approaches and introduce Archias, a domain-specific expert model. Archias categorizes user queries into several categories—domain-specific, malicious, price-injected, prompt-injected, and out-of-domain—and integrates these results into the LLM's prompts to generate more appropriate responses. We validate our approach by building a benchmark dataset focused on the automotive industry, and we contribute to the advancement of research by making it publicly available.

Takeaways, Limitations

Takeaways:
Domain-specific LLM security enhancement measures presented: Effective response to domain-specific security threats through Archias.
Improving user intent understanding and generating appropriate responses: Leveraging Archias' classification results to improve LLM's response accuracy and safety.
Proving the utility of small-scale models: Archias' small size allows for easy customization for a variety of industries and purposes.
Release of Automotive Industry Benchmark Datasets: Contributing to Research and Development Progress.
Limitations:
Since this model is specialized for the automotive industry, it is necessary to verify its generalizability to other domains.
Archias' performance may depend on the LLM and dataset used.
Continuous updates and improvements are needed to address new jailbreaking techniques and prompt injection attacks.
👍