Link: http://arxiv.org/abs/2508.13048v1
PDF Link: http://arxiv.org/pdf/2508.13048v1
Summary: Large Language Models (LLMs) have exhibited remarkable capabilities butremain vulnerable to jailbreaking attacks, which can elicit harmful contentfrom the models by manipulating the input prompts.
Existing black-boxjailbreaking techniques primarily rely on static prompts crafted with a single,non-adaptive strategy, or employ rigid combinations of several underperformingattack methods, which limits their adaptability and generalization.
To addressthese limitations, we propose MAJIC, a Markovian adaptive jailbreakingframework that attacks black-box LLMs by iteratively combining diverseinnovative disguise strategies.
MAJIC first establishes a ``Disguise StrategyPool'' by refining existing strategies and introducing several innovativeapproaches.
To further improve the attack performance and efficiency, MAJICformulate the sequential selection and fusion of strategies in the pool as aMarkov chain.
Under this formulation, MAJIC initializes and employs a Markovmatrix to guide the strategy composition, where transition probabilitiesbetween strategies are dynamically adapted based on attack outcomes, therebyenabling MAJIC to learn and discover effective attack pathways tailored to thetarget model.
Our empirical results demonstrate that MAJIC significantlyoutperforms existing jailbreak methods on prominent models such as GPT-4o andGemini-2.
0-flash, achieving over 90\% attack success rate with fewer than 15queries per attempt on average.
Published on arXiv on: 2025-08-18T16:09:57Z