Link: http://arxiv.org/abs/2511.07315v1
PDF Link: http://arxiv.org/pdf/2511.07315v1
Summary: The widespread application of large VLMs makes ensuring their securedeployment critical.
While recent studies have demonstrated jailbreak attackson VLMs, existing approaches are limited: they require either white-box access,restricting practicality, or rely on manually crafted patterns, leading to poorsample diversity and scalability.
To address these gaps, we propose JPRO, anovel multi-agent collaborative framework designed for automated VLMjailbreaking.
It effectively overcomes the shortcomings of prior methods inattack diversity and scalability.
Through the coordinated action of fourspecialized agents and its two core modules: Tactic-Driven Seed Generation andAdaptive Optimization Loop, JPRO generates effective and diverse attacksamples.
Experimental results show that JPRO achieves over a 60\% attacksuccess rate on multiple advanced VLMs, including GPT-4o, significantlyoutperforming existing methods.
As a black-box attack approach, JPRO not onlyuncovers critical security vulnerabilities in multimodal models but also offersvaluable insights for evaluating and enhancing VLM robustness.
Published on arXiv on: 2025-11-10T17:16:46Z