Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Created by
  • Haebom

Author

Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park

Outline

X - Teaming Evolutionary M2S is a framework that automatically discovers and optimizes multi-turn-to-single-turn (M2S) templates through language model-based evolution. It performs smart sampling from 12 sources and maintains a complete audit log by leveraging LLM-as-judge, inspired by StrongREJECT. Setting a success threshold of $\theta = 0.70$, we obtained two new template families through five generations of evolution, achieving an overall success rate of 44.8% (103/230) on GPT-4.1. Furthermore, we observed that structural improvements varied across models, and that there was a positive correlation between prompt length and scores.

Takeaways, Limitations

We present a reproducible method for creating powerful single-turn prompts using structured search.
The importance of threshold adjustment and cross-model evaluation was emphasized.
We found a positive correlation between prompt length and scores, raising the need for length-based judgment.
Although the specific Limitations is not explicitly mentioned in the paper, it is important to consider performance variations across target models and performance differences between models.
Only experimental results on a limited model (GPT-4.1) are presented, and further research is needed to determine generalizability to other models.
Although we improve on previous studies based on manually written templates, we lack a discussion of the potential problems that may arise during the automation process of the framework.
👍