VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

Link: http://arxiv.org/abs/2505.19684v1

PDF Link: http://arxiv.org/pdf/2505.19684v1

Summary: The emergence of Multimodal Large Language Models (MLRMs) has enabledsophisticated visual reasoning capabilities by integrating reinforcementlearning and Chain-of-Thought (CoT) supervision.

However, while these enhancedreasoning capabilities improve performance, they also introduce new andunderexplored safety risks.

In this work, we systematically investigate thesecurity implications of advanced visual reasoning in MLRMs.

Our analysisreveals a fundamental trade-off: as visual reasoning improves, models becomemore vulnerable to jailbreak attacks.

Motivated by this critical finding, weintroduce VisCRA (Visual Chain Reasoning Attack), a novel jailbreak frameworkthat exploits the visual reasoning chains to bypass safety mechanisms.

VisCRAcombines targeted visual attention masking with a two-stage reasoning inductionstrategy to precisely control harmful outputs.

Extensive experimentsdemonstrate VisCRA's significant effectiveness, achieving high attack successrates on leading closed-source MLRMs: 76.

48% on Gemini 2.

0 Flash Thinking,68.

56% on QvQ-Max, and 56.

60% on GPT-4o.

Our findings highlight a criticalinsight: the very capability that empowers MLRMs -- their visual reasoning --can also serve as an attack vector, posing significant security risks.

Published on arXiv on: 2025-05-26T08:45:06Z