Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Link: http://arxiv.org/abs/2412.16555v1

PDF Link: http://arxiv.org/pdf/2412.16555v1

Summary: Large language models (LLMs) are widely applied in various fields of societydue to their powerful reasoning, understanding, and generation capabilities.

However, the security issues associated with these models are becomingincreasingly severe.

Jailbreaking attacks, as an important method for detectingvulnerabilities in LLMs, have been explored by researchers who attempt toinduce these models to generate harmful content through various attack methods.

Nevertheless, existing jailbreaking methods face numerous limitations, such asexcessive query counts, limited coverage of jailbreak modalities, low attacksuccess rates, and simplistic evaluation methods.

To overcome theseconstraints, this paper proposes a multimodal jailbreaking method: JMLLM.

Thismethod integrates multiple strategies to perform comprehensive jailbreakattacks across text, visual, and auditory modalities.

Additionally, wecontribute a new and comprehensive dataset for multimodal jailbreakingresearch: TriJail, which includes jailbreak prompts for all three modalities.

Experiments on the TriJail dataset and the benchmark dataset AdvBench,conducted on 13 popular LLMs, demonstrate advanced attack success rates andsignificant reduction in time overhead.

Published on arXiv on: 2024-12-21T09:43:51Z