Link: http://arxiv.org/abs/2511.03271v1
PDF Link: http://arxiv.org/pdf/2511.03271v1
Summary: Large Language Models (LLMs) have been widely deployed across variousapplications, yet their potential security and ethical risks have raisedincreasing concerns.
Existing research employs red teaming evaluations,utilizing multi-turn jailbreaks to identify potential vulnerabilities in LLMs.
However, these approaches often lack exploration of successful dialoguetrajectories within the attack space, and they tend to overlook theconsiderable overhead associated with the attack process.
To address theselimitations, this paper first introduces a theoretical model based ondynamically weighted graph topology, abstracting the multi-turn attack processas a path planning problem.
Based on this framework, we propose ABC, anenhanced Artificial Bee Colony algorithm for multi-turn jailbreaks, featuring acollaborative search mechanism with employed, onlooker, and scout bees.
Thisalgorithm significantly improves the efficiency of optimal attack path searchwhile substantially reducing the average number of queries required.
Empiricalevaluations on three open-source and two proprietary language modelsdemonstrate the effectiveness of our approach, achieving attack success ratesabove 90\% across the board, with a peak of 98\% on GPT-3.
5-Turbo, andoutperforming existing baselines.
Furthermore, it achieves comparable successwith only 26 queries on average, significantly reducing red teaming overheadand highlighting its superior efficiency.
Published on arXiv on: 2025-11-05T08:05:58Z