Skip to content
arxiv papers 1 min read

Multi-Turn Jailbreaks Are Simpler Than They Seem

Link: http://arxiv.org/abs/2508.07646v1

PDF Link: http://arxiv.org/pdf/2508.07646v1

Summary: While defenses against single-turn jailbreak attacks on Large Language Models(LLMs) have improved significantly, multi-turn jailbreaks remain a persistentvulnerability, often achieving success rates exceeding 70% against modelsoptimized for single-turn protection.

This work presents an empirical analysisof automated multi-turn jailbreak attacks across state-of-the-art modelsincluding GPT-4, Claude, and Gemini variants, using the StrongREJECT benchmark.

Our findings challenge the perceived sophistication of multi-turn attacks: whenaccounting for the attacker's ability to learn from how models refuse harmfulrequests, multi-turn jailbreaking approaches are approximately equivalent tosimply resampling single-turn attacks multiple times.

Moreover, attack successis correlated among similar models, making it easier to jailbreak newlyreleased ones.

Additionally, for reasoning models, we find surprisingly thathigher reasoning effort often leads to higher attack success rates.

Our resultshave important implications for AI safety evaluation and the design ofjailbreak-resistant systems.

We release the source code athttps://github.

com/diogo-cruz/multi_turn_simpler

Published on arXiv on: 2025-08-11T05:57:41Z