Link: http://arxiv.org/abs/2412.00473v2
PDF Link: http://arxiv.org/pdf/2412.00473v2
Summary: With the significant advancement of Large Vision-Language Models (VLMs),concerns about their potential misuse and abuse have grown rapidly.
Previousstudies have highlighted VLMs' vulnerability to jailbreak attacks, wherecarefully crafted inputs can lead the model to produce content that violatesethical and legal standards.
However, existing methods struggle againststate-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful contentand lack of stealthy malicious guidance.
In this work, we propose a noveljailbreak attack framework: Multi-Modal Linkage (MML) Attack.
Drawinginspiration from cryptography, MML utilizes an encryption-decryption processacross text and image modalities to mitigate over-exposure of maliciousinformation.
To align the model's output with malicious intent covertly, MMLemploys a technique called "evil alignment", framing the attack within a videogame production scenario.
Comprehensive experiments demonstrate MML'seffectiveness.
Specifically, MML jailbreaks GPT-4o with attack success rates of97.
80% on SafeBench, 98.
81% on MM-SafeBench and 99.
07% on HADES-Dataset.
Ourcode is available at https://github.
com/wangyu-ovo/MML
Published on arXiv on: 2024-11-30T13:21:15Z