Skip to content
arxiv papers 1 min read

Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking

Link: http://arxiv.org/abs/2504.05838v1

PDF Link: http://arxiv.org/pdf/2504.05838v1

Summary: Recently, the Image Prompt Adapter (IP-Adapter) has been increasinglyintegrated into text-to-image diffusion models (T2I-DMs) to improvecontrollability.

However, in this paper, we reveal that T2I-DMs equipped withthe IP-Adapter (T2I-IP-DMs) enable a new jailbreak attack named the hijackingattack.

We demonstrate that, by uploading imperceptible image-space adversarialexamples (AEs), the adversary can hijack massive benign users to jailbreak anImage Generation Service (IGS) driven by T2I-IP-DMs and mislead the public todiscredit the service provider.

Worse still, the IP-Adapter's dependency onopen-source image encoders reduces the knowledge required to craft AEs.

Extensive experiments verify the technical feasibility of the hijacking attack.

In light of the revealed threat, we investigate several existing defenses andexplore combining the IP-Adapter with adversarially trained models to overcomeexisting defenses' limitations.

Our code is available athttps://github.

com/fhdnskfbeuv/attackIPA.

Published on arXiv on: 2025-04-08T09:20:29Z