Skip to content
arxiv papers 1 min read

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Link: http://arxiv.org/abs/2504.01308v1

PDF Link: http://arxiv.org/pdf/2504.01308v1

Summary: Vision-Language Models (VLMs) extend the capabilities of Large LanguageModels (LLMs) by incorporating visual information, yet they remain vulnerableto jailbreak attacks, especially when processing noisy or corrupted images.

Although existing VLMs adopt security measures during training to mitigate suchattacks, vulnerabilities associated with noise-augmented visual inputs areoverlooked.

In this work, we identify that missing noise-augmented trainingcauses critical security gaps: many VLMs are susceptible to even simpleperturbations such as Gaussian noise.

To address this challenge, we proposeRobust-VLGuard, a multimodal safety dataset with aligned / misalignedimage-text pairs, combined with noise-augmented fine-tuning that reduces attacksuccess rates while preserving functionality of VLM.

For strongeroptimization-based visual perturbation attacks, we propose DiffPure-VLM,leveraging diffusion models to convert adversarial perturbations intoGaussian-like noise, which can be defended by VLMs with noise-augmented safetyfine-tuning.

Experimental results demonstrate that the distribution-shiftingproperty of diffusion model aligns well with our fine-tuned VLMs, significantlymitigating adversarial perturbations across varying intensities.

The datasetand code are available at https://github.

com/JarvisUSTC/DiffPure-RobustVLM.

Published on arXiv on: 2025-04-02T02:35:19Z