Link: http://arxiv.org/abs/2509.08729v1
PDF Link: http://arxiv.org/pdf/2509.08729v1
Summary: Multi-turn-to-single-turn (M2S) compresses iterative red-teaming into onestructured prompt, but prior work relied on a handful of manually writtentemplates.
We present X-Teaming Evolutionary M2S, an automated framework thatdiscovers and optimizes M2S templates through language-model-guided evolution.
The system pairs smart sampling from 12 sources with an LLM-as-judge inspiredby StrongREJECT and records fully auditable logs.
Maintaining selection pressure by setting the success threshold to $\theta =0.
70$, we obtain five evolutionary generations, two new template families, and44.
8% overall success (103/230) on GPT-4.
1.
A balanced cross-model panel of2,500 trials (judge fixed) shows that structural gains transfer but vary bytarget; two models score zero at the same threshold.
We also find a positivecoupling between prompt length and score, motivating length-aware judging.
Our results demonstrate that structure-level search is a reproducible routeto stronger single-turn probes and underscore the importance of thresholdcalibration and cross-model evaluation.
Code, configurations, and artifacts areavailable at https://github.
com/hyunjun1121/M2S-x-teaming.
Published on arXiv on: 2025-09-10T16:17:44Z