Link: http://arxiv.org/abs/2505.21277v1
PDF Link: http://arxiv.org/pdf/2505.21277v1
Summary: Large Language Models (LLMs), despite advanced general capabilities, stillsuffer from numerous safety risks, especially jailbreak attacks that bypasssafety protocols.
Understanding these vulnerabilities through black-boxjailbreak attacks, which better reflect real-world scenarios, offers criticalinsights into model robustness.
While existing methods have shown improvementsthrough various prompt engineering techniques, their success remains limitedagainst safety-aligned models, overlooking a more fundamental problem: theeffectiveness is inherently bounded by the predefined strategy spaces.
However,expanding this space presents significant challenges in both systematicallycapturing essential attack patterns and efficiently navigating the increasedcomplexity.
To better explore the potential of expanding the strategy space, weaddress these challenges through a novel framework that decomposes jailbreakstrategies into essential components based on the Elaboration Likelihood Model(ELM) theory and develops genetic-based optimization with intention evaluationmechanisms.
To be striking, our experiments reveal unprecedented jailbreakcapabilities by expanding the strategy space: we achieve over 90% success rateon Claude-3.
5 where prior methods completely fail, while demonstrating strongcross-model transferability and surpassing specialized safeguard models inevaluation accuracy.
The code is open-sourced at:https://github.
com/Aries-iai/CL-GSO.
Published on arXiv on: 2025-05-27T14:48:44Z