Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Created by
  • Haebom

Author

Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park

X-Teaming Evolutionary M2S: Discovery and Optimization of M2S Templates through an Automated Framework

Outline

This paper presents the X-Teaming Evolutionary M2S framework, which automatically discovers and optimizes M2S templates through language model-based evolution to overcome the limitations of existing manually written templates for a multi-turn-to-single-turn (M2S) approach that compresses iterative Red-Teaming into a single structured prompt. The framework performs smart sampling from 12 sources and records a complete audit log using LLM-as-judge, inspired by StrongREJECT. Setting the success threshold to $\theta = 0.70$, we obtain two new template families through five generations of evolution, achieving an overall success rate of 44.8% (103/230) on GPT-4.1. Furthermore, we find that the structural gain varies across subjects, and that there is a positive correlation between prompt length and scores.

Takeaways, Limitations

Takeaways:
We demonstrate that structural-level exploration is a reproducible method for more robust single-turn probes.
Emphasizes the importance of threshold correction and cross-model evaluation.
We found a positive relationship between prompt length and scores, raising the need for length awareness assessment.
Limitations:
The presented methodology may not guarantee the same performance for other models.
Further research is needed to determine the generalizability of the optimized template.
All the details of the framework are difficult to grasp from the abstract provided.
👍