Attention! You Vision Language Model Could Be Maliciously Manipulated

Link: http://arxiv.org/abs/2505.19911v1

PDF Link: http://arxiv.org/pdf/2505.19911v1

Summary: Large Vision-Language Models (VLMs) have achieved remarkable success inunderstanding complex real-world scenarios and supporting data-drivendecision-making processes.

However, VLMs exhibit significant vulnerabilityagainst adversarial examples, either text or image, which can lead to variousadversarial outcomes, e.

g.

, jailbreaking, hijacking, and hallucination, etc.

Inthis work, we empirically and theoretically demonstrate that VLMs areparticularly susceptible to image-based adversarial examples, whereimperceptible perturbations can precisely manipulate each output token.

To thisend, we propose a novel attack called Vision-language model Manipulation Attack(VMA), which integrates first-order and second-order momentum optimizationtechniques with a differentiable transformation mechanism to effectivelyoptimize the adversarial perturbation.

Notably, VMA can be a double-edgedsword: it can be leveraged to implement various attacks, such as jailbreaking,hijacking, privacy breaches, Denial-of-Service, and the generation of spongeexamples, etc, while simultaneously enabling the injection of watermarks forcopyright protection.

Extensive empirical evaluations substantiate the efficacyand generalizability of VMA across diverse scenarios and datasets.

Published on arXiv on: 2025-05-26T12:38:58Z