Link: http://arxiv.org/abs/2506.19257v1
PDF Link: http://arxiv.org/pdf/2506.19257v1
Summary: Vision-Language Models (VLMs) have achieved remarkable progress in multimodalreasoning tasks through enhanced chain-of-thought capabilities.
However, thisadvancement also introduces novel safety risks, as these models becomeincreasingly vulnerable to harmful multimodal prompts that can triggerunethical or unsafe behaviors.
Existing safety alignment approaches, primarilydesigned for unimodal language models, fall short in addressing the complex andnuanced threats posed by multimodal inputs.
Moreover, current safety datasetslack the fine-grained, policy-grounded reasoning required to robustly alignreasoning-capable VLMs.
In this work, we introduce {MSR-Align}, a high-qualityMultimodal Safety Reasoning dataset tailored to bridge this gap.
MSR-Alignsupports fine-grained, deliberative reasoning over standardized safety policiesacross both vision and text modalities.
Our data generation pipeline emphasizesmultimodal diversity, policy-grounded reasoning, and rigorous quality filteringusing strong multimodal judges.
Extensive experiments demonstrate thatfine-tuning VLMs on MSR-Align substantially improves robustness against bothtextual and vision-language jailbreak attacks, while preserving or enhancinggeneral reasoning performance.
MSR-Align provides a scalable and effectivefoundation for advancing the safety alignment of reasoning-capable VLMs.
Ourdataset is made publicly available athttps://huggingface.
co/datasets/Leigest/MSR-Align.
Published on arXiv on: 2025-06-24T02:37:59Z