PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization

Link: http://arxiv.org/abs/2504.01444v1

PDF Link: http://arxiv.org/pdf/2504.01444v1

Summary: Multimodal Large Language Models (MLLMs), which integrate vision and othermodalities into Large Language Models (LLMs), significantly enhance AIcapabilities but also introduce new security vulnerabilities.

By exploiting thevulnerabilities of the visual modality and the long-tail distributioncharacteristic of code training data, we present PiCo, a novel jailbreakingframework designed to progressively bypass multi-tiered defense mechanisms inadvanced MLLMs.

PiCo employs a tier-by-tier jailbreak strategy, usingtoken-level typographic attacks to evade input filtering and embedding harmfulintent within programming context instructions to bypass runtime monitoring.

Tocomprehensively assess the impact of attacks, a new evaluation metric isfurther proposed to assess both the toxicity and helpfulness of model outputspost-attack.

By embedding harmful intent within code-style visual instructions,PiCo achieves an average Attack Success Rate (ASR) of 84.

13% on Gemini-ProVision and 52.

66% on GPT-4, surpassing previous methods.

Experimental resultshighlight the critical gaps in current defenses, underscoring the need for morerobust strategies to secure advanced MLLMs.

Published on arXiv on: 2025-04-02T07:54:32Z